Abstract

Breast cancer is one of the illnesses that has become a leading cause of female death worldwide, and early diagnosis of these diseases is a difficult and important undertaking. The intention of this paper is to design prediction models based on Decision Tree, Random Forest and Linear Discriminant Analysis respectively to forecast the breast cancer occurrence at an early stage. The design is focused on analyzing the minimum set of attributes from the clinical data set that have been chosen by experts, and then compare the performances of the models to find their advantages in each model. Breast Cancer Wisconsin Data Set is used in this paper to construct the expected prediction models. Classification accuracy which is determined by comparing real values with predicted values, and generalization ability which is reflected by Receiver Operating Characteristic curve, are used to gauge the performance of the models. The results confirm that the Random Forest model can achieve the highest accuracy to 99.4% and the best generalization ability. However, the Linear Discriminant Analysis model can keep better stability in prediction accuracy and has the fastest running speed.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call