Abstract

Gene expression profile data is a good data source for people to study tumors, but gene expression data has the characteristics of high dimension and redundancy. Therefore, gene selection is a very important step in microarray data classification. In this paper, a feature selection method based on the maximum mutual information coefficient and graph theory is proposed. Each feature of gene expression data is treated as a vertex of the graph, and the maximum mutual information coefficient between genes is used to measure the relationship between the vertices to construct an undirected graph, and then the core and coritivity theory is used to determine the feature subset of gene data. In this work, we used three different classification models and three different evaluation metrics such as accuracy, F1-Score, and AUC to evaluate the classification performance to avoid reliance on any one classifier or evaluation metric. The experimental results on six different types of genetic data show that our proposed algorithm has high accuracy and robustness compared to other advanced feature selection methods. In this method, the importance and correlation of features are considered at the same time, and the problem of gene selection in microarray data classification is solved.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call