Abstract

In previous work, semi-supervised Fuzzy c-means (ssFCM) was used as an automatic classification technique to classify the Nottingham Tenovus Breast Cancer (NTBC) dataset as no method to do this currently exists. However, the results were poor when compared with semi-manual classification. It is known that the NTBC data is highly non-normal and it was suspected that this affected the poor results. This motivated a further investigation into alternative distance metrics to explore their effect on classification results. Mahalanobis, Euclidean and kernel-based distance metrics were used on 100 sets of randomly-selected labelled data. It was found that ssFCM with Euclidean distance successfully and automatically identified the six classes in close agreement with those of Soria et al. We showed that there is also high agreement in the key features that define the breast cancer classes with those of Soria et al. The superiority of Euclidean distance for classifying this dataset, as compared to Mahalanobis distance is unexpected as it can only generate spherical clusters while Mahalanobis distance can generate hyperellipsoidal ones including spherical ones. We expected Mahalanobis distance to generate the hyperellipsoidal clusters that would best fit NTBC data.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call