Abstract Introduction Seventy percent of lung cancer patients are diagnosed at advanced stages. Lung cancer screening (LCS) can potentially produce a stage-shift through early detection of the disease. The 2013 LCS guideline from the U.S. Preventive Services Task Force (USPSTF) recommended screening with low-dose computed tomography (LDCT) for individuals aged between 55 and 80 with 30 pack-year smoking history (i.e., current smoker or had quit smoking within 15 years). However, the high false-positive rate of LCS with LDCT is one of the concerns that hinders the uptake of LCS in real-world settings. An electronic health record (EHR)-based computable phenotyping (CP) algorithm that accurately identifies patients who meet the LCS eligibility criteria can potentially improve the reach of screening eligible population and thereby increase the uptake of LCS. Objective To develop an EHR-based CP algorithm to identify patients eligible for LCS. Method The LCS CP algorithm was developed to extract quantitative smoking information (i.e., pack-years, smoking years, quit year) using both structured EHR and unstructured clinical notes, enabled by advanced natural language processing (NLP) methods. The study cohort consisted of 3,080 patients who received LCS with LDCT based on procedure codes, as documented in EHR data from the UF Health Integrated Data Repository (IDR). The EHR-based LCS CP algorithm included two modules, one to extract smoking information from both structured EHR data and clinical notes using NLP techniques, and the other to integrate the extracted results based on the CP rules (e.g., pack-year > 30; quit year within 15 years; age 55-80) to determine whether a patient is eligible for LCS. For initial evaluation, we conducted a chart review of 20 randomly selected patients and compared the CP algorithm outcomes with the chart review results. Results and Discussion The manual chart review of the 20 patients who underwent LCS with LDCT identified 13 patients were qualified for LCS, 6 patients were not qualified for LCS, and 1 patient was undecidable. Based on this gold standard dataset, the CP algorithm achieved a specificity of 1.00 and a sensitivity of 0.92. Without smoking information extracted from clinical notes using NLP, the specificity score dropped to 0.80. Our results indicate that clinical notes are an important source of information on smoking histories. For all smoking-related information extracted from the clinical notes, smoking history was consistent with the structured EHR in 60% of cases, inconsistent in 10% cases, with the remaining 30% missing. Our results point to (1) suboptimal documentation of smoking information in EHRs, (2) added value of artificial intelligence methods such as NLP in improving CP performance, and (3) potential of an EHR-based CP to accurately identify patients eligible for LCS, and potential relevance to clinical decision support. As the upcoming USPSTF LCS guideline is changing (i.e., from 30 pack-year to 20 pack-year), the CP needs be refined to reflect the changes. Citation Format: Shuang Yang, Tianchen Lyu, Xi Yang, Yonghui Wu, Yi Guo, Michelle Alvarado, Hiren J. Mehta, Ramzi G. Salloum, Dejana Braithwaite, Jinhai Huo, Ya-Chen Tina Shih, Jiang Bian. Developing a computable phenotype to identify populations eligible/ineligible for lung cancer screening [abstract]. In: Proceedings of the AACR Virtual Special Conference on Artificial Intelligence, Diagnosis, and Imaging; 2021 Jan 13-14. Philadelphia (PA): AACR; Clin Cancer Res 2021;27(5_Suppl):Abstract nr PO-092.
Read full abstract