Abstract

Many feature selection methods have been proposed and most of them are in the supervised learning paradigm. Recently unsupervised feature selection has attracted a lot of attention especially in bioinformatics and text mining. So far, supervised feature selection and unsupervised feature selection method are studied and developed separately. A subset selected by a supervised feature selection method may not be a good one for unsupervised learning and vice verse. In bioinformatics research, however it is very common to perform clustering and classification iteratively for the same data sets, especially in gene expression analysis, thus it is very desirable to have a feature selection method which works well for both unsupervised learning and supervised learning. In this paper we propose a novel feature selection algorithm through feature clustering. Our algorithm does not need the class label information in the data set and is suitable for both supervised learning and unsupervised learning. Our algorithm groups the features into different clusters based on feature similarity, so that the features in the same clusters are similar to each other. A representative feature is selected from each cluster, thus reduces the feature redundancy. Our feature selection algorithm uses feature similarity for feature redundancy reduction but requires no feature search, works very well for high dimensional data set. We test our algorithm on some biological data sets for both clustering and classification analysis and the results indicates that our FSFC algorithm can significantly reduce the original data sets without scarifying the quality of clustering and classification.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call