Abstract

BackgroundIschemic stroke (IS) is a major health risk without generally usable effective measures of primary prevention. Early warning signals that are easy to detect and widely available can save lives. Estonia has one nation-wide Electronic Health Record (EHR) database for the storage of medical information of patients from hospitals and primary care providers.MethodsWe extracted structured and unstructured data from the EHRs of participants of the Estonian Biobank (EstBB) and evaluated different formats of input data to understand how this continuously growing dataset should be prepared for best prediction. The utility of the EHR database for finding blood- and urine-based biomarkers for IS was demonstrated by applying different analytical and machine learning (ML) methods.ResultsSeveral early trends in common clinical laboratory parameter changes (set of red blood indices, lymphocyte/neutrophil ratio, etc.) were established for IS prediction. The developed ML models predicted the future occurrence of IS with very high accuracy and Random Forests was proved as the most applicable method to EHR data.ConclusionsWe conclude that the EHR database and the risk factors uncovered are valuable resources in screening the population for risk of IS as well as constructing disease risk scores and refining prediction models for IS by ML.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call