Abstract

Microarray data analysis directly relates with the state of disease through gene expression profile, and is based upon several feature extractions to classification methodologies. This paper focuses on the study of 8 different ways of feature selection preprocess methods from 4 different feature selection methods. They are Minimum Redundancy-Maximum Relevance (mRMR), Max Relevance (MaxRel), Quadratic Programming Feature Selection (QPFS) and Partial Least Squared (PLS) methods. In this study, microarray datasets of colon cancer and leukemia cancer were used for implementing and testing four different classifiers i.e. K-Nearest-Neighbor (KNN), Random Forest (RF), Support Vector Machine (SVM) and Neural Network (NN). The performance was measured by accuracy and AUC (area under the curve) value. The experimental results show that discretization can somehow improve performance of microarray data analysis, and mRMR gives the best performance of microarray data analysis on the colon and leukemia datasets. We also list some results on comparative performance of methods for the specific (data-ratio) number of features.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call