Identification of convulsive epilepsy in sub-Saharan Africa relies on access to resources that are often unavailable. Infrastructure and resource requirements can further complicate case verification. Using machine-learning techniques, we have developed and tested a region-specific questionnaire panel and predictive model to identify people who have had a convulsive seizure. These findings have been implemented into a free app for health-care workers in Kenya, Uganda, Ghana, Tanzania, and South Africa. In this retrospective case-control study, we used data from the Studies of the Epidemiology of Epilepsy in Demographic Sites in Kenya, Uganda, Ghana, Tanzania, and South Africa. We randomly split these individuals using a 7:3 ratio into a training dataset and a validation dataset. We used information gain and correlation-based feature selection to identify eight binary features to predict convulsive seizures. We then assessed several machine-learning algorithms to create a multivariate prediction model. We validated the best-performing model with the internal dataset and a prospectively collected external-validation dataset. We additionally evaluated a leave-one-site-out model (LOSO), in which the model was trained on data from all sites except one that, in turn, formed the validation dataset. We used these features to develop a questionnaire-based predictive panel that we implemented into a multilingual app (the Epilepsy Diagnostic Companion) for health-care workers in each geographical region. We analysed epilepsy-specific data from 4097 people, of whom 1985 (48·5%) had convulsive epilepsy, and 2112 were controls. From 170 clinical variables, we initially identified 20 candidate predictor features. Eight features were removed, six because of negligible information gain and two following review by a panel of qualified neurologists. Correlation-based feature selection identified eight variables that demonstrated predictive value; all were associated with an increased risk of an epileptic convulsion except one. The logistic regression, support vector, and naive Bayes models performed similarly, outperforming the decision-tree model. We chose the logistic regression model for its interpretability and implementability. The area under the receiver operator curve (AUC) was 0·92 (95% CI 0·91-0·94, sensitivity 85·0%, specificity 93·7%) in the internal-validation dataset and 0·95 (0·92-0·98, sensitivity 97·5%, specificity 82·4%) in the external-validation dataset. Similar results were observed for the LOSO model (AUC 0·94, 0·93-0·96, sensitivity 88·2%, specificity 95·3%). On the basis of these findings, we developed the Epilepsy Diagnostic Companion as a predictive model and app offering a validated culture-specific and region-specific solution to confirm the diagnosis of a convulsive epileptic seizure in people with suspected epilepsy. The questionnaire panel is simple and accessible for health-care workers without specialist knowledge to administer. This tool can be iteratively updated and could lead to earlier, more accurate diagnosis of seizures and improve care for people with epilepsy. The Wellcome Trust, the UK National Institute of Health Research, and the Oxford NIHR Biomedical Research Centre.
Read full abstract