Abstract

Breast cancer is the second leading cause of death due to cancer in women currently. It has become the most common cancer in recent years. In early detection of cancer, data mining can be used to diagnose breast cancer. Data mining consists of several research models, one of which is classification. The most commonly used method in classification is the decision tree. C4.5 is an algorithm in the decision tree that is often used in the classification process. In this study, the data used was the Breast Cancer Wisconsin (Original) Data Set (1992) obtained from the UCI Machine Learning Repository. The purpose  of this study was to select features that will be used and overcome class imbalances that occur, so that the performance of the C4.5 algorithm worked more optimal  in the classification process. The methods used as feature selection are PSO and bagging to overcome class imbalances. Classification was tested using the confusion matrix to determine the accuracy that was generated. From the results of this study, the application of PSO as a feature selection and bagging to overcome class imbalances with the C4.5 algorithm succeeded in increasing accuracy by 5.11% with an initial accuracy of 93.43% to 98.54%.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.