Abstract

Feature selection plays a vital role in many fields, particularly in pattern recognition and bioinformatics, for selecting informative and relevant features from high dimensional datasets. The increase in dimensionality of data along with the existence of redundant and irrelevant features leads to challenging performance issues when processing and analysing the data. In this paper, an effective feature selection technique called mutual information and Monte Carlo based feature selection (MIMCFS) is proposed. It comprises of two stages. The first stage aims to select predominant features from the high dimensional data. The second stage involves elimination of redundant features that were selected in the first stage. For the purpose of implementing the first stage, a new feature selection strategy based on the approximate Markov blanket and the concept of mutual information is proposed to find out irrelevant and redundant features. In second stage, to avoid misjudgement of redundant features as relevant features, a new strategy based on Monte Carlo tree search technique is proposed in order to completely eradicate redundant features and to improve feature interaction. For experimental evaluation, eight benchmark microarray datasets including imbalanced ones pertaining to cancer analysis are used. Further, in order to compare and justify the performance of the proposed feature selection method, seven state-of-art feature selection techniques namely CFS, Relief, DISR, JMI, CMIM and CMI are employed. The outputs from these feature selection techniques are provided to three standard classifiers namely Naive Bayes, SVM and C4.5 in order to assess the significance of the selected features in building classification models. 10-fold cross validation is adopted to evaluate the classifiers. Accuracy, precision, recall, f-measure, standard deviation, statistical significance metrics are measured to quantify the classifier performance. Experimental results demonstrate the outstanding performance of the proposed algorithm when compared to that of the standard existing methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call