Abstract

Recent emphasis and requirements for open data publication have led to significant increases in data availability in the Earth sciences, which is critical to long-tail data integration. Currently, data are often published in a repository with an identifier and citation, similar to those for papers. Subsequent publications that use the data are expected to provide a citation in the reference section of the paper. However, the format of the data citation is still evolving, particularly with regards to citing dynamic data, subsets, and collections of data. Considering the motivations of both data producers and consumers, the most pressing need is to create user-friendly solutions that provide credit for data producers and enable accurate citation of data, particularly integrated data. Providing easy-to-use data citations is a critical foundation that is required to address the socio-technical challenges around data integration. Studies that integrate data from dozens or hundreds of datasets must often include data citations in supplementary material due to page limits. However, citations in the supplementary material are not indexed, making it difficult to track citations and thus giving credit to the data producer. In this paper, we discuss our experiences and the challenges we have encountered with current citation guidance. We also review the relative merits of the currently available mechanisms designed to enable compact citation of collections of data, such as data collections, data papers, and dynamic data citations. We consider these options for three data producer scenarios: a domain-specific data collection, a data repository, and a large-scale, multidisciplinary project. We posit that a new mechanism is also needed to enable citation of multiple datasets and credit to data producers.

Highlights

  • Funders of Earth science projects and academic publishers increas­ ingly require scientists to publish data in an open-access data repository (Cousijn et al, 2018; Data Citation Synthesis Group, 2014; Office of Science, 2013; Stall et al, 2019)

  • We present a perspective informed by working closely over many years with several US Department of Energy (DOE) projects, including the Environmental Systems Science Data Infrastructure for a Virtual Ecosystem (ESS-DIVE) data repository (Varadharajan et al, 2018; https: //ess-dive.lbl.gov/), the AmeriFlux carbon flux network (Novick et al, 2018), the Watershed Function Science Focus Area (WFSFA; http://wate rshed.lbl.gov) (Varadharajan et al, 2019), and Generation Ecosystem Experiments - Tropics (NGEE-Tropics; https://ngee-tropics. lbl.gov/)

  • The format of a dataset citation is relatively well defined (DataCite Metadata Working Group, 2019; ESIP Data Preservation and Stewardship Com­ mittee, 2019), but the ways to complete some of the metadata fields that contribute to the citation are still evolving

Read more

Summary

Introduction

Funders of Earth science projects and academic publishers increas­ ingly require scientists to publish data in an open-access data repository (Cousijn et al, 2018; Data Citation Synthesis Group, 2014; Office of Science, 2013; Stall et al, 2019). The DataCite organization maintains a preferred schema and registration service for metadata required to obtain a Digital Object Identifier (DOI) for data (DataCite Metadata Working Group, 2019) and best practices for data citations. The repository assigns a DOI to each published dataset and provides an automated citation using DOI schema metadata fields (e.g., authors, title, publication year, publisher) (Fenner et al, 2019). We discuss the citation challenges encountered when integrating data from a large number of data packages as well as some of the emerging solutions

Example earth science data producers and consumers
ESS-DIVE
AmeriFlux
NGEE–Tropics
Socio-technical challenges of data citations
Data citation
Collective data citation
Data collections
Data papers
Scalable dynamic data citations
Discussion
Summary
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call