Abstract

The differential feature recognition algorithm of breast cancer patients is presented in this paper based on minimum spanning tree (MST) and F-statistics. The algorithm uses the minimum spanning tree clustering algorithm to cluster features of breast cancer data and the F-statistics to determine the proper number of feature clusters. Features most relevant to class labels are selected from each feature cluster to comprise the differential features. After that, samples with recognized features are clustered via MST clustering algorithm. The validity of our algorithm is evaluated by its clustering accuracy on breast cancer dataset of WDBC. In the experiments, correlations between features and class labels and similarities between features are measured by the cosine similarity and Pearson correlation coefficient. Similarities between samples are measured by the cosine similarity, the Euclidean distance and the Pearson correlation coefficient. Experimental results show that the highest clustering accuracy can be got when the cosine similarity is used to measure correlations between features and class labels and similarities between features while the Euclidean distance is used to measure similarities between samples. The recognized features are: mean radius, mean fractal dimension and standard error of fractal dimension.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.