Abstract

512 Background: Machine learning models that predict survival time for patients with cancer can be useful in the clinic. Validating their performance in deployment is important but challenging, because the only timely source of follow-up/death data is the electronic medical record (EMR), which is known to under-capture deaths resulting in informative censoring. We examined whether validation using EMR data can distinguish between low- and high-quality models, by using gold-standard Cancer Registry data to calculate the true model performance level. Methods: This was a retrospective study of 3417 patients diagnosed with metastatic cancer from 2008-2014, who were either diagnosed with cancer or received their initial treatment at our institution. We used regularized logistic regression on 525 demographic and diagnostic features to predict whether each patient survived for >1 year following diagnosis. We trained the model on high-quality Cancer Registry data from 2197 patients diagnosed between 2008 and 2012 and tested it on two sets of patients. The first set consisted of 1184 patients who had Cancer Registry data evaluable for the 1 year survival endpoint (not lost to follow-up prior to 1 year). Of these, 335 were marked as lost to follow-up prior to 1 year in the EMR dataset, leaving 849 patients evaluable for the endpoint using the EMR data. Results: The model using all available features yielded an AUC of 0.760 when validating with Cancer Registry data (n=1184), and an AUC of 0.771 with EMR data (n=849). When using fewer features, observed AUC dropped for both validation data sources. When using Cancer Registry validation data the model was found to be well calibrated, but with EMR validation data the model incorrectly appeared to systematically underpredict survival. Conclusions: EMR data was useful for validation of model discrimination, but not calibration.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call