Abstract

e13536 Background: The use of artificial intelligence in breast cancer diagnosis has had a significant impact. Alternative to other tests, machine learning (ML) models are low-cost tools that can establish an objective prognosis by considering all the clinical, genomic, and histological data from patients. To evaluate the performance of 12 ML models in predicting the overall survival of breast cancer patients at 60 months of follow-up. Methods: Clinical data was obtained from “The Cancer Genome Atlas-Breast Cancer" database. Preprocessing of data ruled out variables with insufficient values or poor relevance. Data was divided into an 80/20 ratio to perform the training and testing of the models. The models used were: Logistic Regression (LR), Ridge Classifier (RC), Least Absolute Shrinkage and Selection Operator (LASSO), K-nearest Neighbors (KNN), Naive Bayes (NB), Linear Discriminant Analysis (LDA), Decision Tree (DT), Multilayer Perceptron (MP), Stochastic Gradient Descent (SGD), Support Vector Machine (SVM), Random Forest (RF), and XGBoost (XGB). Initial training used the default hyperparameters of each model, performance was evaluated through a 5-fold-cross-validation, the results were used with the GridSearchCV tool to optimize the hyperparameters for the final training and testing of the models. Results: The models with most accuracy were NB (86.76%) and SGD (85.29%), with most sensitivity were: SGD (93.75%), MP (91.67%) and NB (89.89%), with the most specificity were: KNN (95.0%), DT (90.0%) and LASSO (85.0%), with the highest area under the ROC curve were: LDA (90.83%), LASSO (90.1%), and LR (89.79%). The variables with the highest relevance were: "Previous diagnosis of cancer", "Presence of tumor", "Ancillary Therapy", and "Histology". Integrating the best-performing models into an interactive tool preserves the best features without sacrificing efficiency. Conclusions: The models with the best overall performance in predicting the prognosis of patients with breast cancer were NB and LASSO. The best way to evaluate the performance of a predictive prognosis tool is the area under the ROC curve. For developing countries, an AI tool to predict the patient outcome is viable when other expensive prognosis genomic tools are not available. Further studies with data from developing countries are needed to improve the performance of this tool.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call