Abstract

Introduction: The Fontan operation palliates single ventricle heart defects. As native anatomy varies, Fontan cases cannot always be identified by ICD9 or 10CM codes. Hypothesis: We sought to train and evaluate a supervised machine learning (ML) system to identify Fontan cases based on unstructured clinical notes in a large database. Methods: 160 adult Fontan patients from validated clinical data at a single tertiary referral center with available text notes were studied. The imbalanced data set had more non-Fontan cases than Fontan patients; thus we created multiple datasets with different positive : negative case ratios ranging from 1:2 to 1:10. We used stratified 80-20 training-testing splits of data. Vectorized representations of text notes were used as features. We trained a Support Vector Machine (SVM) model, mostly used for text classification, to identify Fontan cases from notes. For each dataset, we performed random 80-20 data splitting 10 times and reported average F 1 score (harmonic mean of recall/sensitivity & precision/positive predictive value) over the positive class. Results: The model achieved a mean F 1 score of 0.95 for the positive class on the data split with 1:2 positive-negative ratio. Increasing data imbalance from 1:2 to 1:10 did not substantially impact performance. The mean F 1 score over all data splits was 0.94, and SD 0.01. We also computed precision, recall, and F 1 score of ICD codes to identify Fontan patients. Performance comparisons between ICD codes only and Natural Language Processing (NLP)/ML are in Table 1. Conclusions: A supervised classification model more effectively detects Fontan patients based on clinical notes with higher accuracy than ICD codes. The model is robust and insensitive to data imbalance. Findings suggest our model may work effectively in real-world data. Since the sensitivity of ICD codes is high but PPV is low, it may be beneficial to apply ICD codes as a filter prior to applying NLP/ML to improve performance.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.