Abstract

Variability in the extent of the descriptions of data (‘metadata’) held in public repositories forces users to assess the quality of records individually, which rapidly becomes impractical. The scoring of records on the richness of their description provides a simple, objective proxy measure for quality that enables filtering that supports downstream analysis. Pivotally, such descriptions should spur on improvements. Here, we introduce such a measure - the ‘Metadata Coverage Index’ (MCI): the percentage of available fields actually filled in a record or description. MCI scores can be calculated across a database, for individual records or for their component parts (e.g., fields of interest). There are many potential uses for this simple metric: for example; to filter, rank or search for records; to assess the metadata availability of an ad hoc collection; to determine the frequency with which fields in a particular record type are filled, especially with respect to standards compliance; to assess the utility of specific tools and resources, and of data capture practice more generally; to prioritize records for further curation; to serve as performance metrics of funded projects; or to quantify the value added by curation. Here we demonstrate the utility of MCI scores using metadata from the Genomes Online Database (GOLD), including records compliant with the ‘Minimum Information about a Genome Sequence’ (MIGS) standard developed by the Genomic Standards Consortium. We discuss challenges and address the further application of MCI scores; to show improvements in annotation quality over time, to inform the work of standards bodies and repository providers on the usability and popularity of their products, and to assess and credit the work of curators. Such an index provides a step towards putting metadata capture practices and in the future, standards compliance, into a quantitative and objective framework.

Highlights

  • Pyrobaculum oguniense TE7T (=DSMZ 13380=JCM10595) was originally isolated from the Tsuetate hot spring in Oguni-cho, Kumamoto Prefecture, Japan [1], and subsequently found to grow heterotrophically at an optimal temperature near 94°C, pH 7.0, and in the presence or absence of oxygen

  • The main chromosome is largely syntenic to Pyrobaculum arsenaticum and contains a number of gene clusters that are absent in that species

  • This is of particular interest considering that these species were isolated on opposite sides of the Eurasian continent; P. oguniense was isolated in Japan, while P. arsenaticum was isolated in an arsenic-rich anaerobic pool in Italy

Read more

Summary

Complete genome sequence of Pyrobaculum oguniense

Lowe1 1Biomolecular Engineering, University of California., Santa Cruz, California, USA. Pyrobaculum oguniense TE7 is an aerobic hyperthermophilic crenarchaeon isolated from a hot spring in Japan. We describe its main chromosome of 2,436,033 bp, with three large-scale inversions and an extra-chromosomal element of 16,887 bp. Comparative analyses with the closest known relative, the anaerobe Pyrobaculum arsenaticum from Italy, reveals unexpectedly high synteny and nucleotide identity between these two geographically distant species. Deep sequencing of a mixture of genomic DNA from multiple cells has illuminated some of the genome dynamics potentially shared with other species in this genus

Introduction
Growth conditions and DNA isolation
The Genomic Standards Consoritum
Optimum temperature
Genbank Date of Release GOLD ID Project relevance
Genome sequencing and assembly
Genome annotation
Genome properties
Extracellular structures
Findings
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call