Abstract

AbstractBackgroundEarly identification of individuals at risk for dementia represents an important challenge for the design and implementation of prevention strategies. Machine learning (ML) algorithms might help to identify dementia risk from claims‐based electronic health records (EHR). The study aim was to develop and validate a new method based on ML to identify Veterans at risk for dementia using EHR claims‐based data.MethodWe propose a ML–based probabilistic method to evaluate dementia risk within 10 years, based on information from claims‐based EHR data as part of a retrospective cohort study. We identified veterans without baseline dementia from January 2000 to December 2009 and followed them until December 2019 for dementia onset (ICD codes). A ML model was established using veteran’s EHR data to predict dementia occurrence during follow‐up. The features for the modeling included 72 EHR associated with dementia in previous studies including socio‐demographic, medical and mental morbidities, vital signs, medications, hospitalizations and laboratory data. ML algorithms including Linear Discrimination, Random Forest Classifier, Ensemble Bootstrapping, and Multilayer Perception (MLP) neural network. MLP provided more robust prediction outcomes and was employed for the final dementia modeling and prediction.ResultData comes from 7202 veterans, mean age 54.89 (SD=6.64, range 40‐65) years, 90.3% male, 49.3% Caucasian, and 72.1% Non‐Hispanic. Over a median follow up of 9.82 (IQR=3.44) years, 786 (10.9%) of these veterans developed dementia. The training and testing sets at a ratio of 9:1 were pseudo‐randomly extracted from the original EHR database. The performance was determined on the testing dataset using the neural network model learned from the training dataset. The results from three experiments of modeling and testing are (Overall Accuracy/Sensitivity/Specificity): (1) 84.2%/12.8%/92.9%, (2) 85.7%/13.9%/94.5%, and (3) 87.5%/17.7%/96.1%. The modeling and prediction were implemented by missing data for some variables. The experimental results are: (1) 82.4%/9.84%/89.1%, (2) 82.9%/12.2%/91.1%, (3) 83.1%/16.5%/92.0%.ConclusionThough the model prediction specificity is acceptable (above 90%), the sensitivity of the prediction is low (less than 20%). Exploring better modeling algorithms and treatment methods for unbalanced samples may improve prediction. Future research could benefit from ML techniques that evaluate the optimal combination of variables that best predict dementia.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.