Abstract

Research data currently face a huge increase of data objects with an increasing variety of types (data types, formats) and variety of workflows by which objects need to be managed across their lifecycle by data infrastructures. Researchers desire to shorten the workflows from data generation to analysis and publication, and the full workflow needs to become transparent to multiple stakeholders, including research administrators and funders. This poses challenges for research infrastructures and user-oriented data services in terms of not only making data and workflows findable, accessible, interoperable and reusable, but also doing so in a way that leverages machine support for better efficiency. One primary need to be addressed is that of findability, and achieving better findability has benefits for other aspects of data and workflow management. In this article, we describe how machine capabilities can be extended to make workflows more findable, in particular by leveraging the Digital Object Architecture, common object operations and machine learning techniques.

Highlights

  • In several scientific disciplines, the number, size and variety of objects to be managed are growing

  • Researchers desire to shorten the workflows from data generation to analysis and publication, and the full workflow needs to become transparent to multiple stakeholders, including research administrators and funders

  • Out of the many possible facets related to this challenge that could be derived from the FAIR principles, in this article, we focus on the automation of findability, emphasizing that identifiers are a foundational element from https://b2find.eudat.eu. https://data.csiro.au/dap

Read more

Summary

INTRODUCTION

The number, size and variety of objects to be managed are growing. To support machine-actionable processes in data infrastructures and VREs, objects (including data and workflows, but possibly other artefacts) need to be persistently identifiable independent from location (F1) [7]. This is the primary prerequisite for any other benefits. One well-known established approach for addressing concerns of reproducibility, automation and provenance, in particular, are scientific workflow systems (e.g., [9]) These have seen larger adoption in the “-omics” research area, but are less adopted for climate or geophysics data processing scenarios, in contrast to the adoption of interactive Python via Jupyter notebooks. While this may seem a generally good requirement for any technical system, it is even more critically so if the system is built on automated processes capable of limited autonomy

ELEMENTS OF A POSSIBLE SOLUTION
EXTENDING CAPABILITIES WITH MACHINE LEARNING
OUTLOOK AND CONCLUSIONS
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call