This study conducts a comprehensive analysis of global prostate tumor cases, employing various machine learning models for classification. The research focuses on data preprocessing, feature selection, and model training, utilizing effective techniques for handling biochemical and pathological data. In model training, a range of algorithms such as XGBoost, Random Forest, LightGBM, and CatBoost were explored, ultimately achieving an 89% test accuracy through a Stacking model. For pathological data, strategies included addressing missing values, text cleaning, and segmentation, combining CountVectorizer with multiple algorithms. LightGBM demonstrated optimal performance with an accuracy of 82%. The Bert model, post-segmentation and Focal loss optimization, achieved an accuracy of 80.39%. In summary, this research leverages machine learning to provide a convenient approach for diagnosing and treating prostate tumors, offering robust model support for predicting related diseases.
Read full abstract