Patients with gliomas, particularly glioblastomas (GBM) have a higher risk of developing venous thromboembolism (VTE), including both pulmonary embolism (PE) or deep vein thrombosis (DVT), both correlating with overall survival (OS) and potential biological signaling relevance for analysis of the tumor state in conjunction with evolving biomarkers. Artificial intelligence (AI) approaches thar employ VTE as a clinical feature in brain tumor patients is understudied due to the difficulty in analyzing unstructured clinical data in electronic health records (EHR). Data expansion by creating a word lexicon for natural language processing (NLP) of free-text clinical reports will allow exposure of VTE for classification of large-scale data sets, NLP and AI. Patients with a pathologic diagnosis of GBM (2005-2021) were extracted from EHR and screened for the development of VTE based on radiology free-text reports - ultrasound (US) of extremities and Computed Topography-pulmonary angiogram (CT). Language data were collected manually and verified computationally using radiology report text filtration. Common phrases and words employed were collected to generate a lexicon for future computational analysis. Kaplan-Meier survival analyses were generated for VTE in relation OS and progression-free survival (PFS). A total of 163 patients (mean age = 56.1 ± 12.1, 65% male) were included, 48 (29.4%) were screened for VTE following clinical suspicion on history or physical exam, and 15 (9.2%) were found to have a VTE. Screening methods were US 83.3% (40) or CT 13.9% (6), or both 4.6% (2). 28.6% (12) of US and 37.5% (3) of CT resulted in a positive VTE diagnosis. Terms that were only detected in patients with VTE were for US: "complete" 44%, "thrombosis detected" (8.3%), "occlusive" (16%), "partial" (16%), "critical" (25%), "residual" (8.3%) and for CT: "critical" (67%) and "pulmonary emboli" (33%). Words used in free-text reports showed a high degree of inter-reporter variability. The words "partial", "residual", "complete", "critical" or "clotted" when used as an "or" Boolean statement applied to US and CT radiology reports identified ∼93% of the patients with VTE. Patients with VTE had worse OS (median 14 vs. 19 months, p = .0189) and PFS (median 6 vs. 9 months, p = .0239) compared to patients without VTE indicating underlying pathology associated with both prevalence of VTE and tumor burden and the need to incorporate VTE as a clinical feature in large scale data analyses. US and CT yield a similar percentage of positive VTE findings while employing different terms to characterize VTE. We confirm that patients with VTE have poorer outcomes and present a word combination that identifies patients with VTE in large-scale radiology report data. Further research will focus on validation in other datasets, NLP approaches to optimize and data aggregation with laboratory data and large-scale omics panels to expose thrombosis as a clinical feature for biomarker analysis.
Read full abstract