Abstract

In earth observation and climatological sciences, data and their data services grow on a daily basis in a large spatial extent due to the high coverage rate of satellite sensors, model calculations, but also by continuous meteorological in situ observations. In order to reuse such data, especially data fragments as well as their data services in a collaborative and reproducible manner by citing the origin source, data analysts, e.g., researchers or impact modelers, need a possibility to identify the exact version, precise time information, parameter, and names of the dataset used. A manual process would make the citation of data fragments as a subset of an entire dataset rather complex and imprecise to obtain. Data in climate research are in most cases multidimensional, structured grid data that can change partially over time. The citation of such evolving content requires the approach of “dynamic data citation”. The applied approach is based on associating queries with persistent identifiers. These queries contain the subsetting parameters, e.g., the spatial coordinates of the desired study area or the time frame with a start and end date, which are automatically included in the metadata of the newly generated subset and thus represent the information about the data history, the data provenance, which has to be established in data repository ecosystems. The Research Data Alliance Data Citation Working Group (RDA Data Citation WG) summarized the scientific status quo as well as the state of the art from existing citation and data management concepts and developed the scalable dynamic data citation methodology of evolving data. The Data Centre at the Climate Change Centre Austria (CCCA) has implemented the given recommendations and offers since 2017 an operational service on dynamic data citation on climate scenario data. With the consciousness that the objective of this topic brings a lot of dependencies on bibliographic citation research which is still under discussion, the CCCA service on Dynamic Data Citation focused on the climate domain specific issues, like characteristics of data, formats, software environment, and usage behavior. The current effort beyond spreading made experiences will be the scalability of the implementation, e.g., towards the potential of an Open Data Cube solution.

Highlights

  • Introduction on Dynamic Data CitationCiting datasets in an appropriate manner is agreed upon as good scientific praxis and well established

  • The overall objective of this article was to demonstrate the technical implementation and to provide the future potential of benefits regarding the RDA recommendations, with operational service offered as evidence, such as sustainable storage consumption using the Query Store for the data subset, and automatic adaptation into interoperable metadata description to keep the data provenance information

  • If only a fragment of a dataset is requested, which is served by subset functionalities, a more or less dynamic citation is required [9]

Read more

Summary

Introduction

Introduction on Dynamic Data CitationCiting datasets in an appropriate manner is agreed upon as good scientific praxis and well established. Data citation as a collection of text snippets provides information about the creator of the data, the title, the version, the repository, a time stamp, and a persistent identifier (PID) for persistent data access. With data-driven web services, the data used are not always static, especially in collaborative iteration and creation cycles [14] This is valid for climatological research, where different data sources and models serve as input for new data as derivatives, e.g., climate indices like calculation of the number of tropical nights, which is based on different climate model ensembles. From a data quality point of view, it is preferable that such derivatives be affected and updated automatically by the performed correction chain Such changes in consideration on dependencies in data creation should be communicated as automatically as possible. With the reproducibility of results in mind, it is essential to be able to accurately verify a particular dataset, its exact version, or the creation of data fragments

Objectives
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.