Abstract

In this paper, the authors present an effective information theoretic feature selection method, symmetrical uncertainty, to classify gene expression microarray data and detect biomarkers from it. Here, information gain and symmetrical uncertainty contribute for ranking the features. Based on computed values of symmetrical uncertainty, features were sorted from most informative to least informative ones. Then, the top features from the sorted list are passed to random forest, logistic regression, and other well-known classifiers with leave-one-out cross validation to construct the best classification model(s) and accordingly select the most important genes from microarray datasets. Obtained results in terms of classification accuracy, running time, root mean square error, and other parameters computed on leukemia and colon cancer datasets demonstrate the effectiveness of the proposed approach. The proposed method is relatively much faster than many other wrapper or ensemble methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call