Abstract

Feature selection is a process to select a subset of required features which has the same or nearly same predictive capability as that of the original feature set. Feature selection is a necessary task to be accomplished before classification process, as it becomes difficult to train a classifier for a data set having high dimension (features) and hence, cannot give optimum result. Rough Set Theory (RST) is a mathematical approach for feature selection which does not require any additional information about the data. It has the capability to extract the relevant features smoothly from a high dimensional data set having noise, imprecise, vague and redundant information. In this paper, different high dimensional gene expression data sets are used for feature selection. Using the proposed rough set algorithm, feature selection has been done and a comparative study has been done on the accuracies of the classifiers before and after reducing the features of the data. Also the generalization capability of this algorithm has been shown by taking gene expression data set along with a data set from the area of physics.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call