Abstract

With the recent advances in clinical technologies, a huge amount of data has been accumulated for breast cancer diagnosis. Extracting information from the data to support the clinical diagnosis of breast cancer is a tedious and time-consuming task. The use of machine learning and data mining techniques has significantly changed the whole process of a breast cancer diagnosis. In this research, a prediction model for breast cancer prediction has been developed using features extracted from individual medical screening and tests. To overcome the problem of overfitting and obtain a good prediction accuracy, a Linear Discriminant Analysis (LDA) is applied for the extraction of useful features. This is done to reduce the number of features in the experimental dataset. The proposed model can create new features from the existing features and then get rid of the original features. The newly created features were able to summarize the necessary information contained initially in the original set of features. LDA was chosen because of its usefulness in detecting whether a set of features is worthwhile in predicting breast cancer. In addition to LDA, the proposed model uses Support Vector Machine (SVM) for accurate prediction, hence the name LDA-SVM prediction model. Based on 5-fold cross-validation, the proposed model yields an accuracy of 99.2%, precision of 98.0%, and Recall of 99.0% when tested on the Wisconsin Diagnostic Breast Cancer (WDBC) dataset from the University of California- Irvine machine learning repository. Therefore, SVM shows high efficiency in handling classification problems when combined with feature extraction techniques.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call