Abstract
In previous work, six clinically novel and useful subgroups of breast cancer were identified using rules and clinicians' expertise to combine solutions from three different clustering algorithms on a database of biomarkers. The motivation for the present work is to reproduce this classification using a single clustering method. In the long term, we hope to produce a clinically useful classification using fewer features (biomarkers), reducing the time and cost of running complex and expensive clinical tests. Hence, the aim of this paper is to investigate the use of feature selection in combination with ssFCM to reduce the number of features while maintaining accuracy (defined as agreement with the previous classification), both on our breast cancer biomarker data and on other benchmark datasets. We show experimental results using four feature selection techniques, exploring with 10, 15 and 17 selected features out of the original 25 biomarkers for breast cancer. We experimented with varying amounts of labelled data (10% - 60% of the training data) and we evaluate classification accuracy using cross-validation. It was found that classification accuracy increased using 15 or 17 breast cancer biomarkers. Using SVM-RFE and CFS, improved classification accuracy was found on three UCI datasets, Arrhythmia, Cardiotocography and Yeast.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.