Abstract

Knowledge discovery from real-world data in health care can be demanding due to unstructured data and low registration quality in electronic health records (EHRs). This requires close collaboration of domain experts and data scientists. To perform the knowledge discovery process more effectively and efficiently, a framework for automatic Knowledge Driven Feature Engineering (aKDFE) has been developed. Central to aKDFE is an automated feature engineering (FE), i.e., an automated construction of new, highly informative variables, referred to as features, from those directly observed and recorded, e.g., in EHRs. The framework learns and aggregates domain knowledge to generate features that are more informative compared to those recorded in EHRs or manually engineered (manual FE) as done in medical research projects today.Manual KDFE is a systematic manual FE process, which improves prediction performance without loss of explainability of the predictions. But the following research questions remained open: (i) is it possible to automate KDFE, (ii) are aKDFE features more informative than features from a manual FE process, and (iii) does aKDFE produce explainable and transparent results?To summarize the present study, aKDFE is (i) more efficient than manual FE since it automates the manual knowledge discovery and FE processes. It is (ii) more effective due to its higher predictive power compared to manual KDFE. This was evaluated on a real-world medical research project by comparing the classification ability when using manual and aKDFE features in machine learning (ML) models, measured as the area under the receiver operating characteristic curve (AUROC). The project included 26,992 patients regarding “Negative bone structure effects of antiepileptic drug consumption”. The baseline features (manual FE) used in studied project were compared with features generated by aKDFE; aKDFE-generated features resulted in higher AUROC than baseline features, with a p-value <0.05. Finally, aKDFE (iii) applies and describes data pivoting and feature generation as explicit and transparent operation sequences on EHR features. Inefficiency issues remain, mainly regarding non-automatic FE and baseline generation.Threat of validity to the aKDFE framework exists in the selection of the pivoting method, the generalization of used FE operations and rules, and usage of evaluation metrics.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.