Interoperability and FAIRness through a novel combination of Web technologies

Mark D Wilkinson,Rajaram Kaliyaperumal,Erik A Schultes,Morris A Swertz,Anand Gavai,Mark Thompson,Jerven T Bolleman,Michel Dumontier,Paolo Ciccarese,Arnold Kuzniar,Fleur D.L Kelpin,Tim Clark,Ruben Verborgh,Alasdair J.G Gray,Erik M Van Mulligen,Luiz Olavo Bonino Da Silva Santos

doi:10.7717/peerj-cs.110

Abstract

Data in the life sciences are extremely diverse and are stored in a broad spectrum of repositories ranging from those designed for particular data types (such as KEGG for pathway data or UniProt for protein data) to those that are general-purpose (such as FigShare, Zenodo, Dataverse or EUDAT). These data have widely different levels of sensitivity and security considerations. For example, clinical observations about genetic mutations in patients are highly sensitive, while observations of species diversity are generally not. The lack of uniformity in data models from one repository to another, and in the richness and availability of metadata descriptions, makes integration and analysis of these data a manual, time-consuming task with no scalability. Here we explore a set of resource-oriented Web design patterns for data discovery, accessibility, transformation, and integration that can be implemented by any general- or special-purpose repository as a means to assist users in finding and reusing their data holdings. We show that by using off-the-shelf technologies, interoperability can be achieved atthe level of an individual spreadsheet cell. We note that the behaviours of this architecture compare favourably to the desiderata defined by the FAIR Data Principles, and can therefore represent an exemplar implementation of those principles. The proposed interoperability design patterns may be used to improve discovery and integration of both new and legacy data, maximizing the utility of all scholarly outputs.

Highlights

Carefully-generated data are the foundation for scientific conclusions, new hypotheses, discourse, disagreement and resolution of these disagreements, all of which drive scientific discovery
To combine three elements—data transformed into Resource Description Framework (RDF), which is described by Triple Descriptors, and served via Triple Pattern Fragments (TPF)-compliant URLs
We examine a FAIR Accessor to a dataset, created through a database query, that consists of a specific ‘‘slice’’ of the Protein records within the UniProt database—that is, the set of proteins in Aspergillus nidulans FGSC A4 (NCBI Taxonomy ID 227321) that are annotated as being involved in mRNA Processing (Gene Ontology Accession GO:0006397)

Summary

Introduction

Carefully-generated data are the foundation for scientific conclusions, new hypotheses, discourse, disagreement and resolution of these disagreements, all of which drive scientific discovery. As the volume and complexity of data continue to grow, a data publication and distribution infrastructure is beginning to emerge that is not ad hoc, but rather explicitly designed to support discovery, accessibility, (re)coding to standards, integration, machine-guided interpretation, and re-use. In this text, we use the word ‘‘data’’ to mean all digital research artefacts, whether they be data (in the traditional sense), research-oriented digital objects such as workflows, or combinations/packages of these (i.e., the concept of a ‘‘research object’’, (Bechhofer et al, 2013)). General purpose repositories are less likely to have rich APIs, often requiring manual discovery and download; more importantly, the frequent lack of harmonization of the file types/formats and coding systems in the repository, and lack of curation, results in much of their content being unusable (Roche et al, 2015)

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: PeerJ Computer Science	Publication Date: Apr 24, 2017
Citations: 65	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Interoperability and FAIRness through a novel combination of Web technologies

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PeerJ Computer Science

Lead the way for us

Similar Papers

A Security Architecture for e-Health Services
Rossilawati Sulaiman ... Dharmendra Sharma
-
Rossilawati Sulaiman, et. al.Rossilawati Sulaiman ... Dharmendra Sharma
01 Feb 2008
01 Feb 2008

Spread of potato virus Y NTN in potato cultivars ( Solanum tuberosum L.) with different levels of sensitivity
Nataša Mehle ... Maja Ravnikar
Physiological and Molecular Plant Pathology | VOL. 64
Nataša Mehle, et. al.Nataša Mehle ... Maja Ravnikar
01 Jun 2004
Physiological and Molecular Plant Pathology | VOL. 64

Identification of Genetic Mutations in Japanese Patients with Fructose-1, 6-Bisphosphatase Deficiency
Yoshiharu Kikawa ... Masakatsu Sudo
The American Journal of Human Genetics | VOL. 61
Yoshiharu Kikawa, et. al.Yoshiharu Kikawa ... Masakatsu Sudo
01 Oct 1997
The American Journal of Human Genetics | VOL. 61

Chapter 7 - Security Criteria: Building an Internal Cloud
Vic (J.R.) Winkler
Securing the Cloud | VOL. -
Vic (J.R.) WinklerVic (J.R.) Winkler
01 Jan 2010
Securing the Cloud | VOL. -

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Interoperability and FAIRness through a novel combination of Web technologies

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PeerJ Computer Science