MetaSRA: normalized human sample-specific metadata for the Sequence Read Archive.

Matthew N Bernstein,Anhai Doan,Colin N Dewey

doi:10.1093/bioinformatics/btx334

Matthew N Bernstein, Anhai Doan + Show 1 more

Open Access

https://doi.org/10.1093/bioinformatics/btx334

Copy DOI

Journal: Bioinformatics	Publication Date: May 23, 2017
Citations: 87	License type: CC BY-NC 4.0

Affiliation: University of Wisconsin–Madison

Abstract

MotivationThe NCBI’s Sequence Read Archive (SRA) promises great biological insight if one could analyze the data in the aggregate; however, the data remain largely underutilized, in part, due to the poor structure of the metadata associated with each sample. The rules governing submissions to the SRA do not dictate a standardized set of terms that should be used to describe the biological samples from which the sequencing data are derived. As a result, the metadata include many synonyms, spelling variants and references to outside sources of information. Furthermore, manual annotation of the data remains intractable due to the large number of samples in the archive. For these reasons, it has been difficult to perform large-scale analyses that study the relationships between biomolecular processes and phenotype across diverse diseases, tissues and cell types present in the SRA.ResultsWe present MetaSRA, a database of normalized SRA human sample-specific metadata following a schema inspired by the metadata organization of the ENCODE project. This schema involves mapping samples to terms in biomedical ontologies, labeling each sample with a sample-type category, and extracting real-valued properties. We automated these tasks via a novel computational pipeline.Availability and implementationThe MetaSRA is available at metasra.biostat.wisc.edu via both a searchable web interface and bulk downloads. Software implementing our computational pipeline is available at http://github.com/deweylab/metasra-pipelineSupplementary information Supplementary data are available at Bioinformatics online.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

MetaSRA: normalized human sample-specific metadata for the Sequence Read Archive.

Abstract

Talk to us

Similar Papers

More From: Bioinformatics

Lead the way for us

Similar Papers

Gestão das fontes externas de informação: uma análise dos fatores que influenciam o desempenho inovador
Clandia Maffini Gomes ... Flávia Luciane Scherer
Gestão & Produção | VOL. 18
Clandia Maffini Gomes, et. al.Clandia Maffini Gomes ... Flávia Luciane Scherer
01 Jan 2010
Gestão & Produção | VOL. 18

Don't just dump your data and run: Authors should submit as much experimental information as possible when uploading sequence data.
Matheus Sanitá Lima ... David Roy Smith
EMBO reports | VOL. 18
Matheus Sanitá Lima, et. al.Matheus Sanitá Lima ... David Roy Smith
27 Oct 2017
EMBO reports | VOL. 18

Disclosure Readability and the Sensitivity of Investors' Valuation Judgments to Outside Information
H Scott Asay ... W Brooke Elliott
The Accounting Review | VOL. 92
H Scott Asay, et. al.H Scott Asay ... W Brooke Elliott
01 Sep 2016
The Accounting Review | VOL. 92

Disclosure Readability and the Sensitivity of Investorss Valuation Judgments to Outside Information
Hamilton Scott Asay ... W Brooke Elliott
SSRN Electronic Journal | VOL. -
Hamilton Scott Asay, et. al.Hamilton Scott Asay ... W Brooke Elliott
16 Aug 2016
SSRN Electronic Journal | VOL. -

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

MetaSRA: normalized human sample-specific metadata for the Sequence Read Archive.

Abstract

Talk to us

Similar Papers

More From: Bioinformatics