Abstract

BackgroundMicroarrays have become a routine tool to address diverse biological questions. Therefore, different types and generations of microarrays have been produced by several manufacturers over time. Likewise, the diversity of raw data deposited in public databases such as NCBI GEO or EBI ArrayExpress has grown enormously.This has resulted in databases currently containing several hundred thousand microarray samples clustered by different species, manufacturers and chip generations. While one of the original goals of these databases was to make the data available to other researchers for independent analysis and, where appropriate, integration with their own data, current software implementations could not provide that feature.Only those data sets generated on the same chip platform can be readily combined and even here there are batch effects to be taken care of. A straightforward approach to deal with multiple chip types and batch effects has been missing.The software presented here was designed to solve both of these problems in a convenient and user friendly way.ResultsThe virtualArray software package can combine raw data sets using almost any chip types based on current annotations from NCBI GEO or Bioconductor. After establishing congruent annotations for the raw data, virtualArray can then directly employ one of seven implemented methods to adjust for batch effects in the data resulting from differences between the chip types used. Both steps can be tuned to the preferences of the user. When the run is finished, the whole dataset is presented as a conventional Bioconductor “ExpressionSet” object, which can be used as input to other Bioconductor packages.ConclusionsUsing this software package, researchers can easily integrate their own microarray data with data from public repositories or other sources that are based on different microarray chip types. Using the default approach a robust and up-to-date batch effect correction technique is applied to the data.

Highlights

  • Microarrays have become a routine tool to address diverse biological questions

  • Considering the amount of data and platforms already available, we believe it is becoming increasingly important to cross-compare data generated by different research groups. This has mostly been done via metaanalysis studies, such as the microarray quality control consortium (MAQC) study I, comparing the outcomes of different microarray projects [4,5]

  • Raw data from the studies were pulled from the NCBI Gene Expression Omnibus (GEO) database

Read more

Summary

Results

Combining three human microarray studies from different platforms using defaults (example 1) In order to demonstrate an application of the package, a consistent dataset is compiled out of three different previously published studies carried out on Affymetrics, Agilent and Illumina platforms, respectively. If a GPL code is not available, or the source of the data is not NCBI GEO, an additional step is required to derive correct annotations An example for this is shown in the Additional file 1. This additional information can be provided in a column in the “pData” slot common to all single ExpressionSets (e.g. hand over the parameter “covars=c(‘Batch’,celltype’)”) Another way to store this information would be a data.frame or tab delimited text file holding a “sample_info” table (hand over the parameter “sampleinfo=”; see Table 2 for an example). In the following example we will hand over the “sampleinfo=’create’” parameter to the “virtualArrayExpressionSets” function to pass on the information: During this run, virtualArray will prompt for a modification of the “sample_info.txt” file. All fibroblasts have become clearly distinct from the iPSCs and ESCs, while adult or dermal fibroblasts become distinct from neonatal or foreskin fibroblasts in this setting, indicating an increase in resolution

Conclusions
Background
Discussion
41 GSM710526
26. AnnotationForge
29. Smyth GK
41. BiocParallel
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.