A novel biclustering based missing value prediction method for microarray gene expression data

Samiran Chattopadhyay,Chandra Das,Shilpi Bose

doi:10.1109/mami.2015.7456603

Abstract

The presence of missing values in microarray gene expression data creates severe problem during downstream data analysis as analysis algorithms require complete gene expression profile. In order to get rid of these missing entries effective missing value prediction methods are essential to generate complete data. In this regard, a new biclustering based sequential missing value imputation method is proposed here to predict missing values in microarray gene expression data. Starting from the gene with lowest missing rate, for each missing position, the proposed method computes a bicluster by selecting a subset of similar genes and a subset of similar samples or conditions using a novel distance measure. Then the imputation is carried out sequentially by computing the weighted average of the neighbour genes and samples. To evaluate the performance, the proposed method is rigorously tested and compared with some of the well known existing methods. The effectiveness of the proposed method, is demonstrated on different microarray data sets including time series, non time series, and mixed.

Full Text