Abstract

The ArrayExpress Archive of Functional Genomics Data (http://www.ebi.ac.uk/arrayexpress) is an international functional genomics database at the European Bioinformatics Institute (EMBL-EBI) recommended by most journals as a repository for data supporting peer-reviewed publications. It contains data from over 7000 public sequencing and 42 000 array-based studies comprising over 1.5 million assays in total. The proportion of sequencing-based submissions has grown significantly over the last few years and has doubled in the last 18 months, whilst the rate of microarray submissions is growing slightly. All data in ArrayExpress are available in the MAGE-TAB format, which allows robust linking to data analysis and visualization tools and standardized analysis. The main development over the last two years has been the release of a new data submission tool Annotare, which has reduced the average submission time almost 3-fold. In the near future, Annotare will become the only submission route into ArrayExpress, alongside MAGE-TAB format-based pipelines. ArrayExpress is a stable and highly accessed resource. Our future tasks include automation of data flows and further integration with other EMBL-EBI resources for the representation of multi-omics data.

Highlights

  • The ArrayExpress Archive of Functional Genomics Data is an international functional genomics database at the European Bioinformatics Institute (EMBL-EBI) recommended by most journals as a repository for data supporting peer-reviewed publications

  • Alongside Gene Expression Omnibus (GEO) (3), it is recommended by major journals to store data supporting relevant peer-reviewed publications

  • To facilitate reproducible research (4), we promote the data compliance to the Minimum Information About a Microarray Experiment (MIAME) (5) or Minimum Information about Sequencing Experiment (MINSEQE; http: //www.fged.org/projects/minseqe/) guidelines, and each submission is automatically scored by these criteria allowing users to quickly identify high-quality data sets

Read more

Summary

ANNOTARE SUBMISSION TOOL

A new submission tool based on the community-developed microarray data annotation tool Annotare (12) optimized for supporting microarray, as well as HTS-based data submissions, was released at the beginning of 2014. Annotare uploads the data files from the submitter’s directory and captures experimental metadata through a series of spreadsheet-based web forms (see Figure 1), guiding the submitter step by step when constructing a submission. A validation step is built in to check all the information and files provided prior to executing the submission. The validation step would catch errors such as missing data files for an assay or the absence of attributes for samples, at which point the submitter can make amendments. Annotare generates MAGE-TAB files, which contain the experiment’s metadata, and submits these together with the data files to ArrayExpress, where the accession number is provided to the submitter. Mandatory fields are clearly indicated, allowing submitters to correct most metadata issues prior to submission and speeding up the process. As the decrease in the submission times and the user feedback suggest, the introduction of Annotare has significantly simplified and speeded up the submission process for the user, for users without expert bioinformatics support who remain a significant proportion of the depositors

OTHER DEVELOPMENTS
Findings
FUTURE DEVELOPMENTS
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call