Scaling Method for Batch Effect Correction of Gene Expression Data Based on Spectral Clustering

Momo Matsuda,Tetsuya Sakurai,Xiucai Ye

doi:10.2174/1574893615999200818093540

Abstract

Background: Batch effects are usually introduced in gene expression data, which can dramatically reduce the accuracy of statistical inference in the genomic analysis since samples in different batches cannot be directly comparable. Objective: To accurately measure biological variability and obtain correct statistical inference, we considered to correct / remove the batch effects for merging the samples from different batches into a comparable dataset for high-throughput genomic analysis. Methods: The existing L/S model uses the empirical Bayes methods to find the constant values for multiplication/addition for each gene. Different from the L/S model, we used the dimensionality reduction method. We proposed an effective scaling method to scale each gene by multiplying a constant value, which was formulated as an optimization problem based on spectral clustering. The data samples from different batches can be merged into a comparable dataset with batch effect correction. Furthermore, we proposed an approximation solution to solve the optimization problem for the scaling adjustment values. Results: We evaluated the proposed method on both artificial and gene expression datasets by comparing it with the existing well-established batch effect correction methods. Numerical experiments show that the proposed method projects the data samples from different batches to resemble each other and outperforms the others on both microarray and singlecell RNA-seq datasets. Conclusion: The scaling adjustment for genes and dimensionality reduction improved the accuracy and removed the batch effects, thereby making the proposed method more robust for interfering genes.

Full Text