Abstract

One of the key goals of the FAIR guiding principles is defined by its final principle – to optimize data sets for reuse by both humans and machines. To do so, data providers need to implement and support consistent machine readable metadata to describe their data sets. This can seem like a daunting task for data providers, whether it is determining what level of detail should be provided in the provenance metadata or figuring out what common shared vocabularies should be used. Additionally, for existing data sets it is often unclear what steps should be taken to enable maximal, appropriate reuse. Data citation already plays an important role in making data findable and accessible, providing persistent and unique identifiers plus metadata on over 16 million data sets. In this paper, we discuss how data citation and its underlying infrastructures, in particular associated metadata, provide an important pathway for enabling FAIR data reuse.

Highlights

  • Data citation has been a core part of the infrastructure in the movement toward Open Science [1]

  • Support for data citation was incorporated in version 1.2 of the ANSI/NISO JATS XML schema required for deposition in repositories [2] at the initiative of an expert group convened by FORCE11

  • We address the use of data citation infrastructure for both expressing data provenance and contextualizing data for reuse

Read more

Summary

INTRODUCTION

Data citation has been a core part of the infrastructure in the movement toward Open Science [1]. Guidelines for data citation and their recommended format have recently been outlined by a group of publishers [3] This ensures data citations are consistent in terms of both human and machine readability, and compatible with existing publisher practices. Like the data citation string itself, data citation infrastructure often builds upon existing scholarly infrastructure It expands this infrastructure to enable new functionality that provides a strong foundation for not just referring to data, but injecting it into the scholarly ecosystem and making it more reusable. The aim of this paper, is to introduce an exemplar data citation infrastructure as implemented by DataCite, a global non-profit organization that provides persistent identifiers (DOIs [13]) for research data and other research outputs, and to show how its capabilities may be used to enhance the reusability of data. We touch on the need for the grounding of data citation in the scientific social ecosystem through the scholarly literature

UNDERSTANDING DATA CITATION INFRASTRUCTURE
METADATA AND DATA CITATION INFRASTRUCTURE
PLACING DATA IN THE GLOBAL PROVENANCE GRAPH
BUILDING RESEARCH OBJECTS USING DATA CITATION RECORDS
REUSE AND THE IMPORTANCE OF THE HUMAN
CONCLUSION

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.