Abstract P068: A hybrid modelling approach for abstracting CT imaging indications by integrating natural language processing from radiology reports with structured data from electronic health records

Aparajita Khan,Summer S Han,Solomon Henry,Su-Ying Liang,Ann Leung,Anna Graber-Naidich,Heather A Wakelee,Julie Wu,Leah M Backhus,Allison W Kurian,Curtis Langlotz,Eunji Choi

doi:10.1158/1940-6215.precprev22-p068

Abstract

Abstract Background: Real-world evidence (RWE) studies for surveillance patterns following lung cancer (LC) diagnosis can inform optimizing recommendations on surveillance and practice. One major obstacle in RWE studies for LC surveillance is the lack of radiologic imaging indication for surveillance vs. other reasons (e.g., symptoms). To enable RWE studies for surveillance to detect second primary lung cancer among LC survivors, we developed a hybrid modelling approach that integrates structured data from electronic health records (EHRs) with natural language processing (NLP) from radiology reports for abstracting computed tomography (CT) imaging indications in LC survivors. Methods: We manually reviewed and abstracted CT imaging indications, i.e., surveillance vs. others (e.g., symptoms and metastatic disease follow-up) to create a gold standard from 200 randomly selected radiology reports among 1,952 LC patients (i) who were diagnosed in 2000-2017 at Stanford Health Care (SHC) and (ii) survived ≧5 years after the diagnosis. We abstracted medically relevant key-phrases using the part-of-speech grammar and PageRank algorithms. Hierarchical clustering identified context-specific key-phrase clusters as follows: “surveillance”, “stable”, “nodule”, “symptom”, and “metastasis”. The text-based radiology reports were vectorized to generate NLP features using phrase occurrence frequencies. The structured variables from EHRs included: (i) diagnosis of lung diseases or chest symptoms in previous 6 months, (ii) ordering provider-type (oncology vs. others [e.g. emergency and internal medicine]), and (iii) time from previous CT (≧6 months). A hybrid model was then fitted using logistic regression including both structured and NLP features and validated using a 10-fold cross-validation. The model’s performance was compared to those solely based on NLP or structured data. Results: The dataset of 200 radiology reports included 141 LC survivors (49% White, 72% adenocarcinoma). The proposed hybrid model showed high discrimination (AUC: 0.92), outperforming those based solely on NLP (AUC: 0.88) or structured data (AUC: 0.87). The proposed model demonstrated higher sensitivity (SN: 0.73) and specificity (SP: 0.96) versus those solely based on NLP (SN: 0.53; SP: 0.96) or structured data (SN: 0.53; SP: 0.90). The hybrid model showed that the following variables were positively associated with a higher likelihood that the given CT imaging indication is “surveillance”: (i) a longer time interval (≧6 months) from the previous CT (odds ratio [OR]: 4.63; p=0.01) and key-phrases of (ii) “nodule” (OR: 1.55; p=0.004) and (iii) “stable” (OR: 1.37; p=0.03). On the other hand, the following were negatively associated with the likelihood of surveillance: the key-phrases of “symptom” (OR: 0.17; p=0.02) and “metastasis” (OR: 0.26; p=0.02). Conclusion: A hybrid modeling approach combining text-based NLP and structured EHRs has the potential for abstracting CT imaging indications for LC surveillance. Future directions include validation using other EHR systems and extension using larger data. Citation Format: Aparajita Khan, Julie Wu, Eunji Choi, Anna Graber-Naidich, Solomon Henry, Heather A. Wakelee, Allison W. Kurian, Su-Ying Liang, Ann Leung, Curtis Langlotz, Leah M. Backhus, Summer S. Han. A hybrid modelling approach for abstracting CT imaging indications by integrating natural language processing from radiology reports with structured data from electronic health records. [abstract]. In: Proceedings of the AACR Special Conference: Precision Prevention, Early Detection, and Interception of Cancer; 2022 Nov 17-19; Austin, TX. Philadelphia (PA): AACR; Can Prev Res 2023;16(1 Suppl): Abstract nr P068.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Abstract P068: A hybrid modelling approach for abstracting CT imaging indications by integrating natural language processing from radiology reports with structured data from electronic health records

Abstract

Talk to us

Similar Papers

More From: Cancer Prevention Research

Lead the way for us

Journal: Cancer Prevention Research	Publication Date: Jan 1, 2023
Citations: 1

Similar Papers

Natural Language Processing and the Promise of Big Data: Small Step Forward, but Many Miles to Go.
Thomas M Maddox ... Michael A Matheny
Circulation. Cardiovascular quality and outcomes | VOL. 8
Thomas M Maddox, et. al.Thomas M Maddox ... Michael A Matheny
18 Aug 2015
Circulation. Cardiovascular quality and outcomes | VOL. 8

Automating Access to Real-World Evidence
Marie-Pier Gauthier ... Natasha B Leighl
JTO Clinical and Research Reports | VOL. 3
Marie-Pier Gauthier, et. al.Marie-Pier Gauthier ... Natasha B Leighl
17 May 2022
JTO Clinical and Research Reports | VOL. 3

Real-world treatment response in Japanese patients with cancer using unstructured data from electronic health records
Kenji Araki ... Naohiro Yonemoto
Health and Technology | VOL. 13
Kenji Araki, et. al.Kenji Araki ... Naohiro Yonemoto
16 Feb 2023
Health and Technology | VOL. 13

Informing Patient Surveillance for the Growing Number of Survivors of Lung Cancer
Kevin Ten Haaf
Journal of Thoracic Oncology | VOL. 17
Kevin Ten HaafKevin Ten Haaf
22 Feb 2022
Journal of Thoracic Oncology | VOL. 17

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Abstract P068: A hybrid modelling approach for abstracting CT imaging indications by integrating natural language processing from radiology reports with structured data from electronic health records

Abstract

Talk to us

Similar Papers

More From: Cancer Prevention Research