Development of a classifier to identify patients with probable Lennox-Gastaut syndrome in health insurance claims databases via random forest methodology
Current Medical Research and Opinion, 2019
Describe the development of a claims-based classifier utilizing machine learning to identify patients with probable Lennox-Gastaut syndrome (LGS) from six state Medicaid programs.
Patients were included if they had ≥2 medical claims ≥30 days apart for specified or unspecified epilepsy, excluding those with ≥1 claim for petit mal status. The LGS classifier utilized a random forest algorithm, a compilation of thousands of binary decision trees in which machine-generated predictor variables split the data set into branches that predict the presence or absence of LGS. To construct the splitting rules, the importance of each candidate variable was determined by calculating the mean decrease in Gini impurity. Training and testing were performed on two data sets (30% and 70%) using a "true" LGS and non-LGS patient population. Performance was compared with logistic regression and single tree methodology.
Using a 60% probability threshold, which yielded the highest sensitivity (97.3%) and specificity (95.6%), the classifier identified approximately 4% of patients with epilepsy as probable LGS. The most important input variables included number of distinct antiepileptic drugs received, epilepsy-related outpatient/inpatient visits, electroencephalogram procedures and claims for delayed development. The random forest methodology outperformed logistic regression and single tree methodology. Most of the important LGS predictor characteristics identified by the classifier were statistically significantly associated with LGS status (p < .05).
The claims-based LGS classifier showed high sensitivity and specificity, outperformed single tree and logistic regression methodologies and identified a prevalence of probable LGS that was similar to previously published estimates.