Abstract

Data science is facing the following major challenges: (1) developing scalable cross-disciplinary capabilities, (2) dealing with the increasing data volumes and their inherent complexity, (3) building tools that help to build trust, (4) creating mechanisms to efficiently operate in the domain of scientific assertions, (5) turning data into actionable knowledge units and (6) promoting data interoperability. As a way to overcome these challenges, we further develop the proposals by early Internet pioneers for Digital Objects as encapsulations of data and metadata made accessible by persistent identifiers. In the past decade, this concept was revisited by various groups within the Research Data Alliance and put in the context of the FAIR Guiding Principles for findable, accessible, interoperable and reusable data. The basic components of a FAIR Digital Object (FDO) as a self-contained, typed, machine-actionable data package are explained. A survey of use cases has indicated the growing interest of research communities in FDO solutions. We conclude that the FDO concept has the potential to act as the interoperable federative core of a hyperinfrastructure initiative such as the European Open Science Cloud (EOSC).

Highlights

  • From about the turn of the millennium, it has become apparent that the rapid acceleration in the production of research data has not been matched by an equivalent acceleration in our access to all that data [1]

  • We propose to take the work on persistent identifiers a major step forward by encapsulating sufficient information about a dataset into a FAIR Digital Object (FDO)

  • We briefly discuss some of the current limitations that potentially impede the amplification of actionable knowledge production from the vast volume of newly produced and available data, and discuss the scientific value of FAIR Digital Objects as a unified data organisation model, in particular for data science across the boundaries of domains and disciplines

Read more

Summary

Introduction

From about the turn of the millennium, it has become apparent that the rapid acceleration in the production of research data has not been matched by an equivalent acceleration in our access to all that data [1]. With “data” we do not just refer to published data, but to any data that has been created in research labs and lifted from its original workspace to a domain where it can be managed and shared. Created data and collected data reside in temporary workspaces. Most of the data to be shared with others in internal or external workflows will move to the registered data domain in which rapidly increasing amounts of data are being amassed and managed for reuse. A small fraction of those data will be formally documented and published so that it can be properly cited based on metadata and according to publishers’ requirements.

Layers
Principal Challenges for Data-Intensive Science
Drowning in Data?
Interpreting Scientific Evidence in a Trusted Context
Advancing Data to Actionable Knowledge Units
Tool Proliferation and Fundamental Decisions
Layers accessed through through aa PID
Digital objects in the DFT Core Model
From Digital Objects towards FAIR Digital Objects
Scientific Use Cases
Automatic Processing and Workflows
Stable Domain of Scientific Entities and Relationships
Advanced Plans for Management and Security
Infrastructural and Networking Interests
Conclusions
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.