32 Background: We previously showed that a text mining approach can identify clinical prognostic factors from electronic medical records (EMR) in patients with advanced cancers (1). Here we further examine whether clinical narratives can be exploited to build prognostic tools by applying a machine-learning (ML) approach. Methods: A retrospective study of all patients with stage IV tumors was conducted at a single tertiary cancer centre. The text corpus was formed by extracting narratives from initial consultation letters authored by oncologists, and a feature learning pipeline (2) was then used to extract text features correlating to survival. Five classes of ML algorithms was then applied for survival prediction. Classification performance was assessed by stratified cross-validation and compared with Eastern Cooperative Oncology Group (ECOG) performance scores. Results: EMR were available for analysis in 4791 of 7043 patients from 2001-2017, and in 2211 of these cases ECOG performance scores were available. By applying ML on features extracted from EMR text, survival of patients at 2, 6, 12, 26, 52, and 80 weeks was predicted, with areas under the receiver operating characteristic (ROC) curve of 0.82, 0.80, 0.77, 0.72, 0.72, and 0.76 respectively. ML outperformed ECOG score in predicting patient prognosis between 12-16 weeks ( p < 0.05) and after 52 weeks ( p < 0.05), and was non-inferior at all other time points. Random forest was the best algorithm for the prognostic classification task. Feature filtering threshold was important to classification accuracy ( p < 0.001). Conclusions: In patients with advanced cancers, ML analysis of clinical narratives can be used to automate prognostication with greater accuracy than is currently obtainable from ECOG status.
Read full abstract