Abstract

Dozens of normalization methods for correcting experimental variation and bias in high-throughput expression data have been developed during the last two decades. Up to 23 methods among them consider the skewness of expression data between sample states, which are even more than the conventional methods, such as loess and quantile. From the perspective of reference selection, we classified the normalization methods for skewed expression data into three categories, data-driven reference, foreign reference, and entire gene set. We separately introduced and summarized these normalization methods designed for gene expression data with global shift between compared conditions, including both microarray and RNA-seq, based on the reference selection strategies. To our best knowledge, this is the most comprehensive review of available preprocessing algorithms for the unbalanced transcriptome data. The anatomy and summarization of these methods shed light on the understanding and appropriate application of preprocessing methods.

Highlights

  • The aim of normalization methods for large scale expression data, including microarray and RNAseq, is to eliminate systematic experimental bias and technical variation while preserving biological variation

  • Iterative Rank-Order Normalization (IRON) iteratively identifies the training set for fitting regression curve and the common reference array is selected by implementing all possible comparison between arrays

  • biological scaling normalization (BSN) is generated from the Trimmed Mean of M-values (TMM) normalization and they have similar main ideas, i.e., both of them have scales to represent the change in total expression

Read more

Summary

Introduction

The aim of normalization methods for large scale expression data, including microarray and RNAseq, is to eliminate systematic experimental bias and technical variation while preserving biological variation. Quantile (Bolstad et al, 2003) and lowess (Berger et al, 2004) are well-adopted for analyzing microarray expression data. Some methods like quantile (Bolstad et al, 2003) and median normalization (Anders and Huber, 2010) are employed for RNA-seq expression data, these methods originate from the usage of microarray (Zhou et al, 2015a; Sun et al, 2019)

Methods
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call