Abstract

The differences between each 6th stage of the breast cancer are subtle, and doctors’ judgement alone is not sufficient to determine the 6th stage accurately. 6th stage is the different levels of breast cancer development and it represents the current status of the cancer. Therefore, it is crucial to determine it correctly in order to conduct corresponding treatments. The incorrect categorization of the 6th stage and misuse of treatments can be catastrophic, and there are currently no such models to help doctors predicting the 6th stage. The dataset Seer Breast Cancer Data is used which include features like race, t-stage, n-stage, etc. This paper proposed to use random forest and K Nearest Neighbor (KNN) methods to build models and use features related to the patients and their cancer as training data. The random forest model achieved a predictive result of 99% for precision, recall, and f1 score after data normalization. The only mistake this model made is when differentiating stage IIIA and IIIB. The KNN model achieved an accuracy of 95% after normalization. The result shows that Random Forest model is best suited for predicting the 6th stage. The random forest model with 99% accuracy can effectively help doctors determine the 6th stage when they are having difficulties.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call