Abstract

Metadata curation has become increasingly important for biological discovery and biomedical research because a large amount of heterogeneous biological data is currently freely available. To facilitate efficient metadata curation, we developed an easy-to-use web-based curation application, GEOMetaCuration, for curating the metadata of Gene Expression Omnibus datasets. It can eliminate mechanical operations that consume precious curation time and can help coordinate curation efforts among multiple curators. It improves the curation process by introducing various features that are critical to metadata curation, such as a back-end curation management system and a curator-friendly front-end. The application is based on a commonly used web development framework of Python/Django and is open-sourced under the GNU General Public License V3. GEOMetaCuration is expected to benefit the biocuration community and to contribute to computational generation of biological insights using large-scale biological data. An example use case can be found at the demo website: http://geometacuration.yubiolab.org.Database URL: https://bitbucket.com/yubiolab/GEOMetaCuration

Highlights

  • Metadata curation is an essential step for analyzing and integrating heterogeneous large-scale biological datasets generated by numerous labs across the world [1]

  • There are more than 100 000 datasets in one of the most popular public databases for functional genomics data, Gene Expression Omnibus (GEO) [4]

  • Due to potential errors made by each curator, a curation task must be assigned to multiple curators, and their curation results must be compared and integrated to ensure high curation accuracy. Such a coordinated effort can take a significant amount of time without the assistance of a tool. To address these problems with metadata curation, we developed a curator-friendly web-based application, GEOMetaCuration, for GEO datasets

Read more

Summary

Introduction

Metadata curation is an essential step for analyzing and integrating heterogeneous large-scale biological datasets generated by numerous labs across the world [1]. Such data integration presents biomedical researchers with unprecedented opportunities to discover new biological insights that are hidden if each dataset is dealt with separately [2]. It becomes increasingly challenging to curate semi-structured metadata efficiently and accurately, as the volume of biological data grows rapidly [3]. It is critical to develop new methods to facilitate curation to enable streamlined biological discovery from a large amount of biological datasets.

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call