Natural Language Processing for the Ascertainment and Phenotyping of Left Ventricular Hypertrophy and Hypertrophic Cardiomyopathy on Echocardiogram Reports

Adam N Berman,Curtis Ginder,Zachary A Sporn,Varsha Tanguturi,Michael K Hidrue,Linnea B Shirkey,Yunong Zhao,Ron Blankstein,Alexander Turchin,Jason H Wasfy

doi:10.1016/j.amjcard.2023.08.109

Abstract

Extracting and accurately phenotyping electronic health documentation is critical for medical research and clinical care. We sought to develop a highly accurate and open-source natural language processing (NLP) module to ascertain and phenotype left ventricular hypertrophy (LVH) and hypertrophic cardiomyopathy (HCM) diagnoses fromechocardiogram reports within a diverse hospital network. After the initial development on 17,250 echocardiogram reports, 700 unique reports from 6 hospitals were randomly selected from data repositories within the Mass General Brigham healthcare system and manually adjudicated by physicians for 10 subtypes of LVH and diagnoses of HCM. Using an open-source NLP system, the module was formally tested on 300 training set reports and validated on 400 reports. The sensitivity, specificity, positive predictive value, and negative predictive value were calculated to assess the discriminative accuracy of the NLP module. The NLP demonstrated robust performance across the 10 LVH subtypes, with the overall sensitivity and specificity exceeding 96%. In addition, the NLP module demonstrated excellent performance in detecting HCM diagnoses, with sensitivity and specificity exceeding 93%. In conclusion, we designed a highly accurate NLP module to determine the presence of LVH and HCM on echocardiogram reports. Our work demonstrates the feasibility and accuracy of NLP to detect diagnoses on imaging reports, even when described in free text. This module has been placed in the public domain to advance research, trial recruitment, and population health management for patients with LVH-associated conditions.

Full Text