Abstract

Patients with Inborn errors of immunity (IEI) often suffer protracted diagnostic odysseys; however, machine learning (ML) can facilitate patient detection via analysis of electronic health record (EHR) data. Despite availability of structured EHR data for analytic approaches, clinical-note text represents an untapped opportunity for expanding the computable phenotype of IEI to facilitate patient-finding. This study aims to quantify text-mined feature utility for improving ML-guided IEI prediction. We mined relevant clinical concepts using EHR text from over 6,000 verified IEI patients and over 28,000 controls. Enriched IEI concepts were used to augment a Bayesian Network (probabilistic ML-model). The ML-model was then tested by analyzing EHR clinical note text from 10 verified IEI patients. From this IEI test cohort, we extracted clinical note terms in two ways - manually and via an automated Human Phenotype Ontology (HPO) extraction tool. Terms were then mapped to IEI-relevant concepts for each subject and fed into the ML-model. Prediction improvement was assessed by comparing algorithmic sensitivity for both concept extraction approaches and for the degree to which text-derived concepts enhanced predictions over structured data alone.Text-mining identified 11 additional IEI-relevant concepts. Addition of these concepts improved algorithmic sensitivity from 0.8 to 1 and enabled detection of 2 patients not previously identified. Mean cohort risk score difference (baseline vs text-enhanced ML models) was significant for both manual and automatic concept extraction experiments (0.46 ± 0.40 vs. 0.94 ± 0.10 p = 0.004 and 0.51 ± 0.44 vs. 0.85 ± 0.28 p = 0.05; threshold = 0.06). The concordance rate between expert-determined text concepts and automatically extracted HPO terms was 21% (11/47 concepts) and concept frequency distribution differences between 'expert-derived’ and 'automated-HPO’ was significant for 'Exam Findings '(p = 0.021) and 'Failure to Thrive '(p = 0.011). [Display omitted] Top concepts extracted from text include exam findings, fatigue, failure to thrive, laboratory results, medications and infections which improved ML-model diagnostic sensitivity by 20%. Here, we show that structured and unstructured EHR information allowed for the prediction of IEI in all test subjects. In addition, automated extraction of HPO terms synergistically enhances manually extracted IEI features. In summary, text-mining facilitates discovery of IEI relevant concepts and ML-driven risk prediction.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call