Abstract

BackgroundThe size, velocity, and heterogeneity of Big Data outclasses conventional data management tools and requires data and metadata to be fully machine-actionable (i.e., eScience-compliant) and thus findable, accessible, interoperable, and reusable (FAIR). This can be achieved by using ontologies and through representing them as semantic graphs. Here, we discuss two different semantic graph approaches of representing empirical data and metadata in a knowledge graph, with phenotype descriptions as an example. Almost all phenotype descriptions are still being published as unstructured natural language texts, with far-reaching consequences for their FAIRness, substantially impeding their overall usability within the life sciences. However, with an increasing amount of anatomy ontologies becoming available and semantic applications emerging, a solution to this problem becomes available. Researchers are starting to document and communicate phenotype descriptions through the Web in the form of highly formalized and structured semantic graphs that use ontology terms and Uniform Resource Identifiers (URIs) to circumvent the problems connected with unstructured texts.ResultsUsing phenotype descriptions as an example, we compare and evaluate two basic representations of empirical data and their accompanying metadata in the form of semantic graphs: the class-based TBox semantic graph approach called Semantic Phenotype and the instance-based ABox semantic graph approach called Phenotype Knowledge Graph. Their main difference is that only the ABox approach allows for identifying every individual part and property mentioned in the description in a knowledge graph. This technical difference results in substantial practical consequences that significantly affect the overall usability of empirical data. The consequences affect findability, accessibility, and explorability of empirical data as well as their comparability, expandability, universal usability and reusability, and overall machine-actionability. Moreover, TBox semantic graphs often require querying under entailment regimes, which is computationally more complex.ConclusionsWe conclude that, from a conceptual point of view, the advantages of the instance-based ABox semantic graph approach outweigh its shortcomings and outweigh the advantages of the class-based TBox semantic graph approach. Therefore, we recommend the instance-based ABox approach as a FAIR approach for documenting and communicating empirical data and metadata in a knowledge graph.

Highlights

  • The size, velocity, and heterogeneity of Big Data outclasses conventional data management tools and requires data and metadata to be fully machine-actionable and findable, accessible, interoperable, and reusable (FAIR)

  • We start by specifying the requirements that empirical data must satisfy in the age of eScience and Big Data, with phenotype data as an example. Based on these distinctions and requirements, we introduce and compare the class-based and the instancebased approach for documenting empirical data in the form of semantic graphs

  • From a conceptual point of view, the Phenotype Knowledge Graph approach is superior to the Semantic Phenotypes approach because it minimizes the number of TBox expressions necessary for carrying descriptive contents, which brings about the technical advantage of each entity, quality, and relation being referred to in a description having its own Uniform Resource Identifier7 (URI)

Read more

Summary

Introduction

The size, velocity, and heterogeneity of Big Data outclasses conventional data management tools and requires data and metadata to be fully machine-actionable (i.e., eScience-compliant) and findable, accessible, interoperable, and reusable (FAIR). This can be achieved by using ontologies and through representing them as semantic graphs. Highthroughput technologies, social media, mobile devices, digital imaging, sensors, and the Internet of Things, all contribute to Big Data in science and everyday life, allowing researchers to answer questions that could not be answered before This new driving force for scientific progress in data-rich fields of empirical research has been called data exploration or eScience [5]. Data management tools and with them data and metadata formats and standards have become increasingly important to support various eScience workflows

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call