Abstract
BackgroundMicroarray pre-processing usually consists of normalization and summarization. Normalization aims to remove non-biological variations across different arrays. The normalization algorithms generally require the specification of reference and target arrays. The issue of reference selection has not been fully addressed. Summarization aims to estimate the transcript abundance from normalized intensities. In this paper, we consider normalization and summarization jointly by a new strategy of reference selection.ResultsWe propose a Probe-Treatment-Reference (PTR) model to streamline normalization and summarization by allowing multiple references. We estimate parameters in the model by the Least Absolute Deviations (LAD) approach and implement the computation by median polishing. We show that the LAD estimator is robust in the sense that it has bounded influence in the three-factor PTR model. This model fitting, implicitly, defines an "optimal reference" for each probe-set. We evaluate the effectiveness of the PTR method by two Affymetrix spike-in data sets. Our method reduces the variations of non-differentially expressed genes and thereby increases the detection power of differentially expressed genes.ConclusionOur results indicate that the reference effect is important and should be considered in microarray pre-processing. The proposed PTR method is a general framework to deal with the issue of reference selection and can readily be applied to existing normalization algorithms such as the invariant-set, sub-array and quantile method.
Highlights
Microarray pre-processing usually consists of normalization and summarization
The PTR method improves the performance of all three normalization algorithms
We explicitly address the reference issue in pre-processing expression microarray and propose the PTR method to carry out normalization and summarization jointly
Summary
Microarray pre-processing usually consists of normalization and summarization. Normalization aims to remove non-biological variations across different arrays. Summarization aims to estimate the transcript abundance from normalized intensities. We consider normalization and summarization jointly by a new strategy of reference selection. The Affymetrix expression microarray is the most widely-used platform. It uses 11–20 probes which have 25 oligonucleotide bases, to represent one gene, and as a whole they are called a probe-set. Probe, a mis-match (MM) probe that differs only in the middle (13th) base is included in some expression arrays. We get fluorescence intensity of each probe after image processing. The estimation of gene expression from probe intensities is a statistical problem where much effort has been made.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.