Abstract

Motivation: Microarray results accumulated in public repositories are widely reused in meta-analytical studies and secondary databases. The quality of the data obtained with this technology varies from experiment to experiment, and an efficient method for quality assessment is necessary to ensure their reliability.Results: The lack of a good benchmark has hampered evaluation of existing methods for quality control. In this study, we propose a new independent quality metric that is based on evolutionary conservation of expression profiles. We show, using 11 large organ-specific datasets, that IQRray, a new quality metrics developed by us, exhibits the highest correlation with this reference metric, among 14 metrics tested. IQRray outperforms other methods in identification of poor quality arrays in datasets composed of arrays from many independent experiments. In contrast, the performance of methods designed for detecting outliers in a single experiment like Normalized Unscaled Standard Error and Relative Log Expression was low because of the inability of these methods to detect datasets containing only low-quality arrays and because the scores cannot be directly compared between experiments.Availability and implementation: The R implementation of IQRray is available at: ftp://lausanne.isb-sib.ch/pub/databases/Bgee/general/IQRray.R.Contact: Marta.Rosikiewicz@unil.chSupplementary information: Supplementary data are available at Bioinformatics online.

Highlights

  • Thousands of microarray results are available in public repositories such as the Gene Expression Omnibus (Edgar, et al, 2002) and ArrayExpress (Brazma, et al, 2003)

  • Combining results from several independent studies allows improved detection of differentially expressed genes, and analysis of biological pathways and of co-expression networks (Tseng, et al, 2012). These vast transcriptomic resources have been extensively used for functional gene annotation and reanalysis of lists of candidate genes obtained with high-throughput experiments. These tasks are facilitated by large secondary databases such as Genevestigator (Hruz, et al, 2008), BioGPS (Wu, et al, 2013), the Gene Expression Atlas (Kapushesky, et al, 2010) or Bgee (Bastian, et al, 2008) that allow mining of many microarray experiments at the same time

  • Because of the limitations of available methods, we propose a new method for multi-experiment quality control

Read more

Summary

Introduction

Thousands of microarray results are available in public repositories such as the Gene Expression Omnibus (Edgar, et al, 2002) and ArrayExpress (Brazma, et al, 2003) This wealth of expression data covering many organisms, tissues, developmental stages, diseases and treatments is available for meta-analyses, system biology studies, and use in secondary databases. Combining results from several independent studies allows improved detection of differentially expressed genes, and analysis of biological pathways and of co-expression networks (Tseng, et al, 2012). These vast transcriptomic resources have been extensively used for functional gene annotation and reanalysis of lists of candidate genes obtained with high-throughput experiments. There are many more specialized databases, which for example collect data only from a selected species (Dash, et al, 2012; Le Crom, et al, 2002) or for diseases (Hebestreit, et al, 2012; Rhodes, et al, 2007), or provide resources for more specific analyses, such as COXPRESdb for studying coexpressed genes (Obayashi, et al, 2013) or TiSGeD for the analysis of tissue-specific gene expression (Xiao, et al, 2010)

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call