MetaBar - a tool for consistent contextual data acquisition and standards compliant submission

Wolfgang Hankeln,Pier Luigi Buttigieg,Pelin Yilmaz,Frank Oliver Glöckner,Dennis Fink,Renzo Kottmann

doi:10.1186/1471-2105-11-358

Abstract

BackgroundEnvironmental sequence datasets are increasing at an exponential rate; however, the vast majority of them lack appropriate descriptors like sampling location, time and depth/altitude: generally referred to as metadata or contextual data. The consistent capture and structured submission of these data is crucial for integrated data analysis and ecosystems modeling. The application MetaBar has been developed, to support consistent contextual data acquisition.ResultsMetaBar is a spreadsheet and web-based software tool designed to assist users in the consistent acquisition, electronic storage, and submission of contextual data associated to their samples. A preconfigured Microsoft® Excel® spreadsheet is used to initiate structured contextual data storage in the field or laboratory. Each sample is given a unique identifier and at any stage the sheets can be uploaded to the MetaBar database server. To label samples, identifiers can be printed as barcodes. An intuitive web interface provides quick access to the contextual data in the MetaBar database as well as user and project management capabilities. Export functions facilitate contextual and sequence data submission to the International Nucleotide Sequence Database Collaboration (INSDC), comprising of the DNA DataBase of Japan (DDBJ), the European Molecular Biology Laboratory database (EMBL) and GenBank. MetaBar requests and stores contextual data in compliance to the Genomic Standards Consortium specifications. The MetaBar open source code base for local installation is available under the GNU General Public License version 3 (GNU GPL3).ConclusionThe MetaBar software supports the typical workflow from data acquisition and field-sampling to contextual data enriched sequence submission to an INSDC database. The integration with the megx.net marine Ecological Genomics database and portal facilitates georeferenced data integration and metadata-based comparisons of sampling sites as well as interactive data visualization. The ample export functionalities and the INSDC submission support enable exchange of data across disciplines and safeguarding contextual data.

Highlights

Environmental sequence datasets are increasing at an exponential rate; the vast majority of them lack appropriate descriptors like sampling location, time and depth/altitude: generally referred to as metadata or contextual data
The project EXAMPLE is created in MetaBar and the users PM1 and PM2 are added to EXAMPLE
The impact of better contextual data availability and correctness in the primary sequence databases will greatly improve the possibilities to reach a higher level of data integration and interpretation to address basic ecological questions

Summary

Introduction

Environmental sequence datasets are increasing at an exponential rate; the vast majority of them lack appropriate descriptors like sampling location, time and depth/altitude: generally referred to as metadata or contextual data. Latitude, longitude (INSDC: lat_lon), and time (INSDC: collection_date), elements of the key contextual data tuple (x,y,z,t), are only reported in 7.3% and 7.2% of all submissions [Guy Cochrane, personal communication, October 2009]. Even if these data are available, correctness is not guaranteed. The National Center for Biotechnology Information (NCBI), for example, curates the Reference Sequence (RefSeq) database which aims to provide a comprehensive, non-redundant, well-annotated set of sequences, including genomic DNA, transcripts and proteins http:// www.ncbi.nlm.nih.gov/RefSeq/. The European Molecular Biology Laboratory (EMBL) provides the UniProt/SwissProt Knowledgebase which focuses on high quality protein sequence annotations http://www.ebi.ac.uk/uniprot/ [13]. The common aim of these efforts is to enhance the quality of the sequence or protein data and annotations rather than to provide more information on the data processing or the environment where the sample or organism has been taken

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Jun 30, 2010
Citations: 29	License type: cc-by

R Discovery Prime

R Discovery Prime

MetaBar - a tool for consistent contextual data acquisition and standards compliant submission

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

CDinFusion--submission-ready, on-line integration of sequence and contextual data.
Wolfgang Hankeln ... Jan Gerken
PLoS ONE | VOL. 6
Wolfgang Hankeln, et. al.Wolfgang Hankeln ... Jan Gerken
13 Sep 2011
PLoS ONE | VOL. 6

Towards the unification of sequence-based classification and sequence-based identification of host-associated microorganisms.
Joshua R Herr ... Maarja Öpik
New Phytologist | VOL. 205
Joshua R Herr, et. al.Joshua R Herr ... Maarja Öpik
26 Nov 2014
New Phytologist | VOL. 205

Enriching public descriptions of marine phages using the Genomic Standards Consortium MIGS standard
Melissa Beth Duhaime ... Frank Oliver Glöckner
Standards in Genomic Sciences | VOL. 4
Melissa Beth Duhaime, et. al.Melissa Beth Duhaime ... Frank Oliver Glöckner
29 Apr 2011
Standards in Genomic Sciences | VOL. 4

Putting everything in its place: using the INSDC compliant Pathogen Data Object Model to better structure genomic data submitted for public health applications.
Ruth E Timme ... Carla Cummins
Microbial genomics | VOL. 9
Ruth E Timme, et. al.Ruth E Timme ... Carla Cummins
12 Dec 2023
Microbial genomics | VOL. 9

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

MetaBar - a tool for consistent contextual data acquisition and standards compliant submission

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics