Abstract

BackgroundAffymetrix GeneChips and Illumina BeadArrays are the most widely used commercial single channel gene expression microarrays. Public data repositories are an extremely valuable resource, providing array-derived gene expression measurements from many thousands of experiments. Unfortunately many of these studies are underpowered and it is desirable to improve power by combining data from more than one study; we sought to determine whether platform-specific bias precludes direct integration of probe intensity signals for combined reanalysis.ResultsUsing Affymetrix and Illumina data from the microarray quality control project, from our own clinical samples, and from additional publicly available datasets we evaluated several approaches to directly integrate intensity level expression data from the two platforms. After mapping probe sequences to Ensembl genes we demonstrate that, ComBat and cross platform normalisation (XPN), significantly outperform mean-centering and distance-weighted discrimination (DWD) in terms of minimising inter-platform variance. In particular we observed that DWD, a popular method used in a number of previous studies, removed systematic bias at the expense of genuine biological variability, potentially reducing legitimate biological differences from integrated datasets.ConclusionNormalised and batch-corrected intensity-level data from Affymetrix and Illumina microarrays can be directly combined to generate biologically meaningful results with improved statistical power for robust, integrated reanalysis.

Highlights

  • Affymetrix GeneChips and Illumina BeadArrays are the most widely used commercial single channel gene expression microarrays

  • Direct cross-platform integration of MAQC data The Microarray Quality Control (MAQC) consortium [18] investigated the reproducibility of microarray-derived gene expression measurements by assessing performance across platforms, chips, and processing sites using a titration of Universal Human Reference RNA (UHRR) and Human Brain Reference RNA (UBRR)

  • Adjusting for the platform differences using the meancentring method [5] provided only a marginal improvement compared to uncorrected data, whilst the Distance Weighted Discrimination (DWD) method [20] suppressed the platform-specific bias and legitimate biological variability between samples (Figure 2 and Additional file 1A)

Read more

Summary

Introduction

Affymetrix GeneChips and Illumina BeadArrays are the most widely used commercial single channel gene expression microarrays. Public data repositories are an extremely valuable resource, providing array-derived gene expression measurements from many thousands of experiments. Many of these studies are underpowered and it is desirable to improve power by combining data from more than one study; we sought to determine whether platform-specific bias precludes direct integration of probe intensity signals for combined reanalysis. Systematic review is a valuable tool to synthesise high-quality empirical evidence from independent investigations in order to determine a consensus view Such reviews, or meta-analyses have greater statistical power to identify true effects from study-specific artefacts and, as such, are capable of identifying subtle effects that might be missed or deemed insignificant in smaller datasets. Due to the reduced patient-patient variation, these studies can be highly effective for identifying consistent geneexpression changes, such as the effects of (neoadjuvant) cancer treatment

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call