Improving the performance of imputation methods for gene expression classification using feature selection

Cao Truong Tran

doi:10.1109/rivf55975.2022.10013809

Abstract

Gene expression data has been successfully used for cancer classification. However, gene expression data often suffers from a large number of missing values which makes serious issues for classification. A common approach to performing classification with incomplete data is to use imputation methods for estimating missing values before constructing classifiers. However, due to a large number of redundant features, imputation methods for gene expression data are ineffective and inefficient. Feature selection is a popular way to remove redundant features, but it has not been investigated to improve imputation for gene expression data. Therefore, this paper proposes an integration feature selection with imputation to solve the problem. Experimental results show that the proposed method not only improves the classification accuracy, but also speed up the imputation process.

Full Text