Abstract

The Semantic Web and Linked Data concepts and technologies have empowered the scientific community with solutions to take full advantage of the increasingly available distributed and heterogeneous data in distinct silos. Additionally, FAIR Data principles established guidelines for data to be Findable, Accessible, Interoperable, and Reusable, and they are gaining traction in data stewardship. However, to explore their full potential, we must be able to transform legacy solutions smoothly into the FAIR Data ecosystem. In this paper, we introduce SCALEUS-FD, a FAIR Data extension of a legacy semantic web tool successfully used for data integration and semantic annotation and enrichment. The core functionalities of the solution follow the Semantic Web and Linked Data principles, offering a FAIR REST API for machine-to-machine operations. We applied a set of metrics to evaluate its “FAIRness” and created an application scenario in the rare diseases domain.

Highlights

  • The creation of large volumes of data in institutions scattered all over the world via widespread computerization, the use of advanced laboratory equipment, and increasing digitization over time have transformed life sciences into data-driven sciences [1]

  • We present a semantic web tool complying with FAIR Data principles, designated SCALEUS-FD, which allows data integration and reuse

  • The comparison between the requirements stated in the previous section and the features of the legacy tool guided the specification of the SCALEUS-FD building blocks

Read more

Summary

Introduction

The creation of large volumes of data in institutions scattered all over the world via widespread computerization, the use of advanced laboratory equipment, and increasing digitization over time have transformed life sciences into data-driven sciences [1] This exponential growth resulted in data being a fragmented universe of spreadsheets, databases, and nonrelational repositories of documents or just simple raw data dumps, in most cases with zero exposure outside the institutional framework, some in the long tail of science and technology, compromising its reuse [2, 3]. Secondary use of data as a way to extract knowledge in the life sciences increased greatly with the creation of several data repositories and the digitalization of biobanks [9] This did not immediately translate into the creation of a coherent ecosystem of data, considering that heterogeneity, sparsity, the coexistence of different formats, and lack of interoperability between distributed data are obstacles to be overcome [10]. The information network created can be used to search for information from a single entry point [18]

Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call