Abstract

8571 Background: Recent advances in machine learning have improved the accuracy of prognosis prediction of cancer. The applicability of algorithm creation methods to data from various institutions that were not utilized for algorithm training should be established to create an ML algorithm suitable for clinical practice for patients at various facilities. However, no large-scale studies have validated the inter-institutional generalizability of this algorithm across different institutions. Methods: We conducted a multicenter, retrospective, hospital-based cohort study of consecutive patients diagnosed with stage IV lung cancer between January 2016 and December 2020. The study population was randomly categorized into the training and independent test cohorts with a 2:1 ratio. The primary metric to assess algorithm performance was the area under the receiver operating characteristic curve in the independent test cohorts. To assess the inter-institutional generalizability of the algorithm, we investigated its ability to predict patient outcomes in the remaining facility after being trained using data from 15 other facilities. Results: We enrolled 6751 patients with stage IV lung cancer from 16 institutions in Japan. The median age was 70 years, and 4,421 (65%) had a PS score of 0 or 1. As a driver oncogene, 1,515 (22%) patients had mutated EGFR. The algorithm performance metrics in the test cohort showed that the area under the receiver operating characteristic curve values ranging from 0.83 to 0.90 across different predicted survival days (180 to 1080). The performance test of 16 algorithms for investigating inter-institutional generalizability showed median areas under the receiver operating characteristic curve of 0·87 (range, 0·84–0·92), 0·84 (range, 0·78–0·88), 0·84 (range, 0·76–0·89), and 0·84 (range, 0·75–0·90) at 180, 360, 720, and 1080 days, respectively. Specifically, at 180 predicted survival days, the AUC for the performance of algorithms classified based on facilities exceeded 0.8 for all algorithms (16/16). Similarly, the AUC exceeded 0.8 for 15/16, 12/16, and 12/16 facilities at 360, 720, and 1080 days, respectively, indicating a high degree of inter-institutional generalizability of the algorithm creation method. Conclusions: This large-scale study showed that our ML algorithm can accurately predict prognosis in patients with stage IV lung cancer. Furthermore, the algorithm creation method demonstrated a high degree of inter-institutional generalizability. Therefore, our ML algorithm can be generalized to various hospitals and implemented in clinical practice to benefit patients across diverse healthcare facilities.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call