Abstract
In previous work, semi-supervised Fuzzy c-means (ssFCM) was used as an automatic classification technique to classify the Nottingham Tenovus Breast Cancer (NTBC) dataset as no method to do this currently exists. However, the results were poor when compared with semi-manual classification. It is known that the NTBC data is highly non-normal and it was suspected that this affected the poor results. This motivated a further investigation into alternative distance metrics to explore their effect on classification results. Mahalanobis, Euclidean and kernel-based distance metrics were used on 100 sets of randomly-selected labelled data. It was found that ssFCM with Euclidean distance successfully and automatically identified the six classes in close agreement with those of Soria et al. We showed that there is also high agreement in the key features that define the breast cancer classes with those of Soria et al. The superiority of Euclidean distance for classifying this dataset, as compared to Mahalanobis distance is unexpected as it can only generate spherical clusters while Mahalanobis distance can generate hyperellipsoidal ones including spherical ones. We expected Mahalanobis distance to generate the hyperellipsoidal clusters that would best fit NTBC data.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.