Abstract

This study proposes a supervised feature selection technique for classification in high dimensional binary class problems by adding robustness in the conventional Fisher Score. The proposed method utilizes the more robust measure of location, i.e. the median and measure of dispersion known as Rousseeuw and Croux statistic (Qn). Initially, a minimum subset of genes is identified by the greedy search approach, which is then combined with the top ranked genes obtained via the proposed Robust Fisher Score (RFish). To remove redundancy in the selected genes, Least Absolute Shrinkage and Selection Operator (LASSO) is then applied. The proposed method is validated on five publicly available datasets and is further assessed in a detailed simulation study. The results of the proposed method are compared with six well known feature selection methods based on prediction performance via Random Forest (RF), Support Vector Machine (SVM) and k Nearest Neighbour (k-NN) classifiers. The findings are presented in boxplots and barplots, which show that the proposed method (RFish) outperforms all the other methods in the majority of cases.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call