Abstract

AbstractThe progress of science is tied to the standardization of measurements, instruments, and data. This is especially true in the Big Data age, where analyzing large data volumes critically hinges on the data being standardized. Accordingly, the lack of community‐sanctioned data standards in paleoclimatology has largely precluded the benefits of Big Data advances in the field. Building upon recent efforts to standardize the format and terminology of paleoclimate data, this article describes the Paleoclimate Community reporTing Standard (PaCTS), a crowdsourced reporting standard for such data. PaCTS captures which information should be included when reporting paleoclimate data, with the goal of maximizing the reuse value of paleoclimate data sets, particularly for synthesis work and comparison to climate model simulations. Initiated by the LinkedEarth project, the process to elicit a reporting standard involved an international workshop in 2016, various forms of digital community engagement over the next few years, and grassroots working groups. Participants in this process identified important properties across paleoclimate archives, in addition to the reporting of uncertainties and chronologies; they also identified archive‐specific properties and distinguished reporting standards for new versus legacy data sets. This work shows that at least 135 respondents overwhelmingly support a drastic increase in the amount of metadata accompanying paleoclimate data sets. Since such goals are at odds with present practices, we discuss a transparent path toward implementing or revising these recommendations in the near future, using both bottom‐up and top‐down approaches.

Highlights

  • Paleoclimatology is a highly integrative discipline, often requiring the comparison of multiple datasets and model simulations to reach fundamental insights about the climate system

  • Our approach builds on two synergistic elements: (1) the LinkedEarth Ontology (Emile-Geay et al, 2019), which provides an unambiguous structure and terminology to describe the metadata of a paleoclimate dataset; and 2) the LinkedEarth Platform (Gil et al, 2017), which enables the collaborative authoring of highly-structured metadata about paleoclimate datasets using the terms in the LinkedEarth Ontology

  • In Paleoclimate Community reporTing Standard (PaCTS) v1.0, a legacy dataset is defined as a dataset that is not being archived by the author(s) of the original study

Read more

Summary

Introduction

Paleoclimatology is a highly integrative discipline, often requiring the comparison of multiple datasets and model simulations to reach fundamental insights about the climate system Such syntheses are hampered by the time and effort required to transform the data into a usable format for each application. Wrangling involves identifying missing values or outliers in the time series, and searching multiple databases for the scattered records, contacting the original investigators for the missing data and metadata, and organizing the data into a machine-readable format This wrangling requires an understanding of each dataset’s originating field and its unspoken practices, and so cannot be automated or outsourced to unskilled labor or software.

Background
Towards PaCTS
An example
Findings
Discussion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call