Abstract
BackgroundSimultaneous measurement of gene expression on a genomic scale can be accomplished using microarray technology or by sequencing based methods. Researchers who perform high throughput gene expression assays often deposit their data in public databases, but heterogeneity of measurement platforms leads to challenges for the combination and comparison of data sets. Researchers wishing to perform cross platform normalization face two major obstacles. First, a choice must be made about which method or methods to employ. Nine are currently available, and no rigorous comparison exists. Second, software for the selected method must be obtained and incorporated into a data analysis workflow.ResultsUsing two publicly available cross-platform testing data sets, cross-platform normalization methods are compared based on inter-platform concordance and on the consistency of gene lists obtained with transformed data. Scatter and ROC-like plots are produced and new statistics based on those plots are introduced to measure the effectiveness of each method. Bootstrapping is employed to obtain distributions for those statistics. The consistency of platform effects across studies is explored theoretically and with respect to the testing data sets.ConclusionsOur comparisons indicate that four methods, DWD, EB, GQ, and XPN, are generally effective, while the remaining methods do not adequately correct for platform effects. Of the four successful methods, XPN generally shows the highest inter-platform concordance when treatment groups are equally sized, while DWD is most robust to differently sized treatment groups and consistently shows the smallest loss in gene detection. We provide an R package, CONOR, capable of performing the nine cross-platform normalization methods considered. The package can be downloaded at http://alborz.sdsu.edu/conor and is available from CRAN.
Highlights
Simultaneous measurement of gene expression on a genomic scale can be accomplished using microarray technology or by sequencing based methods
Researchers who perform high throughput gene expression assays often deposit their data in public databases such as ArrayExpress [11] and Gene Expression Omnibus (GEO) [12], the latter of which currently houses 630, 845 assays distributed among 9, 348 platforms
In this paper we provide a comparison of available methods based on the MicroArray Quality Control (MAQC) project [17] data set and a human sperm data set [44] containing data from multiple platforms
Summary
Simultaneous measurement of gene expression on a genomic scale can be accomplished using microarray technology or by sequencing based methods. Researchers who perform high throughput gene expression assays often deposit their data in public databases, but heterogeneity of measurement platforms leads to challenges for the combination and comparison of data sets. While generation sequencing seems likely to replace microarrays for expression analysis in the near future, the large amount of microarray data already in existence could continue to be useful to researchers for many years to come. These characteristics can affect microarray performance [7,8,9,10]. The use of linkers to reduce steric hindrance, as employed by the Applied Biosystems and Illumina platforms in table 1 is one method for increasing the sensitivity of short probes. The method by which probes are constructed and attached, and the overall construction of the array, can affect probe uniformity and intra-platform
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.