Abstract

Data and metadata interoperability between data storage systems is a critical component of the FAIR data principles. Programmatic and consistent means of reconciling metadata models between databases promote data exchange and thus increases its access to the scientific community. This process requires (i) metadata mapping between the models and (ii) software to perform the mapping. Here, we describe our efforts to map metadata associated with genome assemblies between the National Center for Biotechnology Information (NCBI) data resources and the Chado biological database schema. We present mappings for multiple NCBI data structures and introduce a Tripal software module, Tripal EUtils, to pull metadata from NCBI into a Tripal/Chado database. We discuss potential mapping challenges and solutions and provide suggestions for future development to further increase interoperability between these platforms. Database URL: https://github.com/NAL-i5K/tripal_eutils.

Highlights

  • BackgroundBiologists increasingly recognize the need to make data and metadata more findable, accessible, interoperable and reusable (FAIR) [1]

  • When data exist in two different structures—whether these are flat files, relational databases or something else—an inability to map between those structures can slow down or entirely prevent data integration and data reuse

  • As discussed in the previous section, we focus on National Center for Biotechnology Information (NCBI) databases that (i) make their data and metadata available via NCBI’s Eutilities and (ii) represent data and/or metadata relevant to genome assemblies

Read more

Summary

Introduction

Biologists increasingly recognize the need to make data and metadata more findable, accessible, interoperable and reusable (FAIR) [1]. These guiding principles provide a framework to guide the improvement of research data cyberinfrastructure and equip scientists to use public data to enhance knowledge discovery. Modeling the full structure of data and metadata and creating appropriate linkages between datasets require sophisticated data storage structures, such as a relational database. When data exist in two different structures—whether these are flat files, relational databases or something else—an inability to map between those structures can slow down or entirely prevent data integration and data reuse

Objectives
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call