Abstract

Breast cancer is one of the most common and fatal cancers in women today. To detect the differential features for correctly diagnosing breast cancer patients, a 2D feature selection algorithm is proposed in this paper. It is named as SDP for short because it adopts the Standard Deviation and Pearson correlation coefficient to define the feature discernibility and independence respectively. The feature importance is defined as the product of its discernibility and independence. A 2D space is constructed using discernibility as x-axis and independence as y-axis. The area of the rectangular enclosed by a feature coordinate lines and axes implies the importance of the feature. Those features in the upper right corner of the 2D space are with much higher importance than the rest ones. They comprise the differential feature subset. The spectral clustering algorithms SC_SD and SC_MD are adopted for clustering analysis, so the two algorithms SDP+SC_SD and SDP+SC_MD were developed for clustering analysis to the breast cancer data WBCD (Wisconsin Breast Cancer Database), WDBC (Wisconsin Diagnostic Breast Cancer), and WPBC (Wisconsin Prognostic Breast Cancer). The experimental results demonstrate that the proposed SDP can detect much better differential features of breast cancers than other compared feature selection algorithms, and the SDP+SC_SD and SDP+SC_MD algorithms outperform the algorithms without feature selection process embedded, such as SC_SD, SC_MD, DPC, SD_DPC, K-means and SD_K-medoids in terms of clustering accuracy, AMI (Adjusted Mutual Information), ARI (Adjusted Rand Index), sensitivity, specificity and precision.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.