Abstract

This research aims to provide guidance for achieving precision medicine by accurately predicting the incidence of lung cancer. This paper used random forest screening to identify several variables that make significant influences on the lung cancer, and used Smote oversampling to address the issue of data imbalance. Finally, this paper used Catboost to construct a model to handle categorical features. Through analyzing experimental data, can found that among the selected variables, age has the greatest importance. Afterwards, validated the model and the training score was 0.9032, indicating good results. At the same time, establish a confusion matrix to prove once again that the model has good predictive performance. Finally, based on cross validation, both the optimal validation accuracy score and the accurate validation accuracy score were above 0.9, indicating that the model performance is excellent. The innovation point of this paper is to realize precision medicine through high accuracy prediction of lung cancer.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call