Abstract

ABSTRACTUsing machine learning algorithms for early prediction of the signs and symptoms of breast cancer is in demand nowadays. One of these algorithms is the K-nearest neighbor (KNN), which uses a technique for measuring the distance among data. The performance of KNN depends on the number of neighboring elements known as the K value. This study involves the exploration of KNN performance by using various distance functions and K values to find an effective KNN. Wisconsin breast cancer (WBC) and Wisconsin diagnostic breast cancer (WDBC) datasets from the UC Irvine machine learning repository were used as our main data sources. Experiments with each dataset were composed of three iterations. The first iteration of the experiment was without feature selection. The second one was the L1-norm based selection from the model, which used the linear support vector classifier feature selection, and the third iteration was with Chi-square-based feature selection. Numerous evaluation metrics like accuracy, receiver operating characteristic (ROC) curve with the area under curve (AUC) and sensitivity, etc., were used for the assessment of the implemented techniques. The results indicated that the technique involving the Chi-square-based feature selection achieved the highest accuracy with the Canberra or Manhattan distance functions for both datasets. The optimal K values for these distance functions ranged from 1 to 9. This study indicated that with the appropriate selection of the K value and a distance function in KNN, the Chi-square-based feature selection for the WBC datasets gives the highest accuracy rate as compared with the existing models.Abbreviations: KNN: K-nearest neighbor; Chi2: Chi-square; WBC: Wisconsin breast cancer

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.