Abstract

BackgroundOver the past decade, gene expression microarray studies have greatly expanded our knowledge of genetic mechanisms of human diseases. Meta-analysis of substantial amounts of accumulated data, by integrating valuable information from multiple studies, is becoming more important in microarray research. However, collecting data of special interest from public microarray repositories often present major practical problems. Moreover, including low-quality data may significantly reduce meta-analysis efficiency.ResultsM2DB is a human curated microarray database designed for easy querying, based on clinical information and for interactive retrieval of either raw or uniformly pre-processed data, along with a set of quality-control metrics. The database contains more than 10,000 previously published Affymetrix GeneChip arrays, performed using human clinical specimens. M2DB allows online querying according to a flexible combination of five clinical annotations describing disease state and sampling location. These annotations were manually curated by controlled vocabularies, based on information obtained from GEO, ArrayExpress, and published papers. For array-based assessment control, the online query provides sets of QC metrics, generated using three available QC algorithms. Arrays with poor data quality can easily be excluded from the query interface. The query provides values from two algorithms for gene-based filtering, and raw data and three kinds of pre-processed data for downloading.ConclusionM2DB utilizes a user-friendly interface for QC parameters, sample clinical annotations, and data formats to help users obtain clinical metadata. This database provides a lower entry threshold and an integrated process of meta-analysis. We hope that this research will promote further evolution of microarray meta-analysis.

Highlights

  • Over the past decade, gene expression microarray studies have greatly expanded our knowledge of genetic mechanisms of human diseases

  • The completeness and accuracy of information largely depend on the meticulousness of authors, and this issue constitutes a major challenge for microarray meta-analysis

  • This study develops M2DB, an expert curated database, to solve microarray data retrieval, annotation, preprocessing, and quality-assessment problems

Read more

Summary

Background

Rapid accumulation of vast amounts of microarray data in public databases like Gene Expression Omnibus (GEO) [1] and ArrayExpress [2] over the past few years has made it possible to retrieve, integrate, and compare microarray results from many datasets [3,4]. M2DB contains more than 10,000 previously published Affymetrix array data, re-annotated with controlled vocabularies from ontologies (most from NCI Thesaurus), according to available clinical information. M2DB provides human curated annotations, raw data, uniformly preprocessed data, and sets of QC metrics, and significantly improves the quality and comparability of microarray metadata generated by different laboratories. Dataset collection and pre-processing Raw intensity data (CEL files) generated using Affymetrix HG-U133A and HG-U133 plus 2.0 platforms were. Sample annotation To ensure annotation consistency and make the retrieval process more efficient, clinical information for each sample was manually curated, based on data obtained from GEO, ArrayExpress, and published papers. “Disease State Suppl.” contains supplementary information for “Disease State,” for example, the disease name for a “Disease” specimen or the reasons why a sample is annotated as “Abnormality.” “Disease Location” specifies the anatomical location of a disease or the primary site of a cancer. The other email with a download link is sent when the job is completed

Findings
Discussion
Conclusions
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.