Abstract
To develop a natural language processing (NLP) tool to extract forced vital capacity (FVC) values from electronic health record (EHR) notes in patients with rheumatoid arthritis-interstitial lung disease (RA-ILD). We selected RA-ILD patients (n = 7485) in the Veterans Health Administration (VA) between 2000 and 2020 using validated ICD-9/10 codes. We identified numeric values in proximity to FVC string patterns from clinical notes in the EHR. Subsequently, we performed processing steps to account for variability in note structure, related pulmonary function test (PFT) output, and values copied across notes, then assigned dates from linked administrative procedure records. NLP-derived FVC values were compared to values recorded directly from PFT equipment available on a subset of patients. We identified 5911 FVC values (n = 1844 patients) from PFT equipment and 15 383 values (n = 4982 patients) by NLP. Among 2610 date-matched FVC values from NLP and PFT equipment, 95.8% of values were within 5% predicted. The mean (SD) difference was 0.09% (5.9), and values strongly correlated (r = 0.94, p < 0.001), with a precision of 0.87 (95% CI 0.86, 0.88). NLP captured more patients with longitudinal FVC values (n = 3069 vs. n = 1164). Mean (SD) change in FVC %-predicted per year was similar between sources (-1.5 [30.0] NLP vs. -0.9 [16.6] PFT equipment; standardized response mean = 0.05 for both). NLP of EHR notes increases the capture of accurate, longitudinal FVC values by three-fold over PFT equipment. Use of this NLP tool can facilitate pharmacoepidemiologic research in RA-ILD and other lung diseases by capturing this critical measure of disease severity.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Similar Papers
More From: Pharmacoepidemiology and drug safety
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.