ObjectivesPredicting the prognosis of lung cancer is crucial for providing optimal medical care. However, a method to accurately predict the overall prognosis in patients with stage IV lung cancer, even with the use of machine learning, has not been established. Moreover, the inter-institutional generalizability of such algorithms remains unexplored. This study aimed to establish machine learning-based algorithms with inter-institutional generalizability to predict prognosis. Materials and MethodsThis multicenter, retrospective, hospital-based cohort study included consecutive patients with stage IV lung cancer who were randomly categorized into the training and independent test cohorts with a 2:1 ratio, respectively. The primary metric to assess algorithm performance was the area under the receiver operating characteristic curve in the independent test cohort. To assess the inter-institutional generalizability of the algorithms, we investigated their ability to predict patient outcomes in the remaining facility after being trained using data from 15 other facilities. ResultsOverall, 6,751 patients (median age, 70 years) were enrolled, and 1,515 (22 %) showed mutated epidermal growth factor receptor expression. The median overall survival was 16.6 (95 % confidence interval, 15.9–17.5) months. Algorithm performance metrics in the test cohort showed that the areas under the curves were 0.90 (95 % confidence interval, 0.88–0.91), 0.85 (0.84–0.87), 0.83 (0.81–0.85), and 0.85 (0.82–0.87) at 180, 360, 720, and 1,080 predicted survival days, respectively. The performance test of 16 algorithms for investigating inter-institutional generalizability showed median areas under the curves of 0.87 (range, 0.84–0.92), 0.84 (0.78–0.88), 0.84 (0.76–0.89), and 0.84 (0.75–0.90) at 180, 360, 720, and 1,080 days, respectively. ConclusionThis study developed machine learning algorithms that could accurately predict the prognosis in patients with stage IV lung cancer with high inter-institutional generalizability. This can enhance the accuracy of prognosis prediction and support informed and shared decision-making in clinical settings.