Abstract

BackgroundGene expression microarray technologies are widely used across most areas of biological and medical research. Comparing and integrating microarray data from different experiments would be very useful, but is currently very challenging due to the experimental and hybridization conditions, as well as data preprocessing and normalization methods. Furthermore, even in the case of the widely-used, industry-standard Affymetrix oligonucleotide microarrays, the various array generations have different probe sets representing different genes, hindering the data integration.ResultsIn this study our objective is to find systematic approaches to normalize the data emerging from different Affymetrix array generations and from different laboratories. We compare and assess the accuracy of five normalization methods for Affymetrix gene expression data using 6,926 Affymetrix experiments from five array generations. The methods that we compare include 1) standardization, 2) housekeeping gene based normalization, 3) equalized quantile normalization, 4) Weibull distribution based normalization and 5) array generation based gene centering. Our results indicate that the best results are achieved when the data is normalized first within a sample and then between-samples with Array Generation based gene Centering (AGC) normalization.ConclusionWe conclude that with the AGC method integrating different Affymetrix datasets results in values that are significantly more comparable across the array generations than in the cases where no array generation based normalization is used. The AGC method was found to be the best method for normalizing the data from several different array generations, and achieve comparable gene values across thousands of samples.

Highlights

  • Gene expression microarray technologies are widely used across most areas of biological and medical research

  • The methods that we compared include 1) standardization (Z), 2) housekeeping gene based normalization (HK), 3) equalized quantile normalization (Q), 4) Weibull distribution based normalization (WBL) and 5) array generation based gene centering (AGC). These were tested in the following ten combinations: Pure preprocessed data (MAS) without any further normalization, Z, Housekeeping gene centering (HK), Q, WBL-normalizations, and all of these normalization methods combined with the array-generation based gene centering method (AGC) method: MASAGC, ZAGC, HKAGC, QAGC and WBLAGC

  • We applied five different ways to estimate the degree of comparability between data from different array generations, including: 1) correlation between technical replicates, 2) correlation between randomly selected genes, 3) classification of the samples based on the anatomical classes, 4) comparison of correlations between the samples computed based on the anatomical classes and array generations, 5) stability of the house-keeping genes

Read more

Summary

Introduction

Gene expression microarray technologies are widely used across most areas of biological and medical research. A Celsius data warehousing system aggregates Affymetrix CEL-files and associated metadata [6] These studies have included several thousands of samples from separate studies. Since different array types and normalization methods have typically been carried out for each study, the integration and direct comparison between the samples is difficult. Most of these meta-analyses are performed one-study-at-a-time, summing up the results together. There are some publications describing the integration of data between different Affymetrix array generations These methods are often based on the normalization of oligonucleotide microarray data using sequence overlaps between the individual oligos on the same slide [7,8,9]. In the comparisons across multiple platforms, the number of informative genes is significantly reduced

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.