Abstract

AbstractBackgroundThe Health and Aging Brain Study: Health Disparities (HABS‐HD) seeks to understand the biological, social and environmental factors that impact brain aging among diverse communities. HABS‐HD, like many other NIH funded data‐sharing projects, has important data assets for various uses, including social, environmental and behavioral data, and multiple data flow pathways. Machine learning (ML) develops algorithms and models to continuously improve itself over time, but the determination of data quality and its readiness are needed for these models to operate efficiently. Therefore, developing a data readiness reporting methodology has become a very urgent task for HABS‐HD.MethodIn this study, we developed a conceptual framework of data readiness. First, we analyzed the missing data percentage and used ML‐Based Multiple Imputation (MLMI) for missing data imputation. Then, we performed SVM based on Recursive Feature Elimination and Cross Validation (SVM‐RFE‐CV) for feature elimination and outlier removal. Lastly, we rated the data readiness based on the three metrics: missing data percentage, performance before feature engineering, and performance after feature engineering to rate data readiness. All the three scores were averaged to rate the overall readiness of data.ResultA framework for calculating overall average score for readiness of data was presented (1 stands for completely accessible, 0 for not accessible at all, and 0.5 for neutral). Our results show that the framework of data readiness was straightforward and useful in assessing how ready the HABS‐HD data is for ML.ConclusionThe systematic analysis of readiness of data before building ML models is of utmost importance. And it has a significant impact on biomarker discovery and disease prediction application for Alzheimer’s disease. The conceptual framework of data readiness works well for our Alzheimer’s disease models in HABS‐HD. It can also be applied to other disease data readiness reporting.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call