e13591 Background: Detection of progression of patients with early stage cancer is a challenge for real world data from electronic health records (EHR). Temporal patterns common to patients’ data can detect patients with metastatic progression. A deep learning approach on longitudinal patient records of cancer lab testing, visit diagnosis codes, and cancer treatments was used to identify metastatic status. Methods: ICD-10-CM diagnosis codes, clinical lab values specific to cancer testing, and antineoplastic treatments were evaluated from longitudinal data of 128,614 patients who did not progress and 5,884 who did develop metastasis, across 47 tumor types in the Syapse Learning Health Network. Patients metastatic at diagnosis were excluded. Data were included beginning at the time of primary cancer diagnosis and ended for all surviving patients at an established administrative cutoff date. In the metastatic group, visits after the progression event were included but censored if they contained diagnosis codes specifically for secondary malignant neoplasms. A binary classification was developed using a multi-headed attention transformer in order to determine metastatic status. Patient history was sequentially aggregated to visit-level embeddings across their longitudinal record. Categorical layers were used for antineoplastics and diagnosis codes. Linear layers were used to embed lab values, as well as for visit intervals from primary cancer date and administrative cutoff date. All embeddings were used as feature inputs for the classifier. Pre-training random sampling of the non-metastatic population was applied to establish equally-weighted labels in a 80:10:10 split for training, validation, and testing sets; to account for the smaller population of patients who progressed to metastatic status. Results: This classification approach achieved 0.86 PR-AUC, 0.79 ROC-AUC, and 0.75 F1. This performance is comparable to published models trained for single tumor cohort prediction. With our censoring constraints, these model results are robust in the absence of routine signals of metastasis in the EHR, such as staging reports and diagnosis coding for distant metastases. Conclusions: This method generalizes, across multiple cancer types, an accurate classification of metastatic progression from patient visit history. This work is immediately useful for real world evidence data analysis complementing patients with metastases already captured in hospital registries. Using sequential deep learning with EHR data classes, this approach may be used to forecast metastatic progression for early intervention.
Read full abstract