Triple Store Research Articles

OpenBiodiv is a complex ecosystem of tools and services for RDF conversion of XML narratives of biodiversity articles including Darwin Core data into Linked Open Data (LOD), running on top of a graph database. OpenBiodiv provides four main types of services: Searching named entities (e.g., taxon names, taxon concepts, treatments, specimens, occurrences, gene sequences, bibliographic information, institutions, persons) in context, within and between articles. Answering questions based on the presence of certain named entities within specific article sections (e.g., titles, abstracts, introduction or other sections, taxon treatments). Identifying article sections for further text processing (NLP) and providing contextual information, stored in MongoDB. Federating the SPARQL endpoint with other triple stores to enrich the discovered knowledge. Searching named entities (e.g., taxon names, taxon concepts, treatments, specimens, occurrences, gene sequences, bibliographic information, institutions, persons) in context, within and between articles. Answering questions based on the presence of certain named entities within specific article sections (e.g., titles, abstracts, introduction or other sections, taxon treatments). Identifying article sections for further text processing (NLP) and providing contextual information, stored in MongoDB. Federating the SPARQL endpoint with other triple stores to enrich the discovered knowledge. Conversion of such data into RDF follows a general semantic model expressed in the OpenBiodiv-O ontology, an extension of the Treatment Ontology for knowledge representation of current and legacy biodiversity publications (Senderov et al. 2018) and uses two main sources, the full-text article XML published on the ARPHA Publishing Platform and the taxon treatments extracted by Plazi’s TreatmentBank from more than 100 biodiversity journals, stored in the Biodiversity Literature Repository at Zenodo. To ensure efficiency, quality control and fast tracking of all stages of the entire process of extraction, conversion to RDF and indexing of the content has been re-built on the Apache Kafka event streaming platform (Fig. 1). In this new format, OpenBiodiv provides not only a GraphDB SPARQL query endpoint but also indexes the named entities through Elasticsearch and additional provision of data to end users through a RESTful API and a number of user applications. OpenBiodiv is designed for a wide range of users who are interested in a deep-level bibliographic exploration, an ontology-linked search of various data elements (e.g., specimens, sequences, taxon concepts, persons), or co-existence of named entities (e.g., taxon names with a possible biotic relationships between them, or taxon names and potential habitats of occupation) in pre-defined sections of the articles. The SPARQL endpoint allows complex queries of various kinds (Dimitrova et al. 2021).

Read full abstract

The "biodiversity knowledge graph" is a nice metaphor for connecting biodiversity data sources, but can we actually build it? Do we have sufficient linked data available? Given that a knowledge graph is an aggregation of data from multiple sources, how do we give those sources credit for that data, and how do we handle changes to that data? Given that the classic interface to a knowledge graph is an intimidatingly empty SPARQL query box, how do we make the knowledge within a graph more accessible? This talk discusses an attempt to build a knowledge graph with an eye on how to maintain the graph in the future. It adopts a model similar to Global Biodiversity Information Facility (GBIF) and CheckListBank where individual data providers make datasets available as independently citable units with Digital Object Identifiers (DOIs). Each dataset comprises linked data in the form of N-triples. To create a knowledge graph we simply download one or more such datasets and add them to a triple store. Each data source is assigned to its own named graph, such that we have provenance for each dataset, and we can update any dataset independently. Furthermore, anyone can build their own knowledge graph by mixing and matching the set of data (people, publications, taxa, etc.) most appropriate to their interests. To bootstrap this approach, exemplar datasets are created based on data harvested from ORCID, Zenodo, and taxonomic name databases. Each demonstration dataset could be replaced in the future by data published directly by those providers. In some cases there are sufficient shared identifiers (such as DOIs and ORCIDs) to form a graph, but taxonomic data typically forms isolated islands. To help the knowledge graph coalesce we need "glue" in the form of datasets that link pairs of different identifiers, such as Life Science Identifiers (LSIDs) for names to DOIs for publications. With the addition of those cross links we can start to generate bibliographies for taxa, discover communities of taxonomic expertise, and more. This model of building a knowledge graph also opens opportunities for smaller, focussed datasets to be added to the graph using the same approach (as set of N-triples archived in an online repository). In order to be useful, a knowledge graph needs to be easy to query and visualise. Simply providing a SPARQL endpoint is unlikely to be enough. As part of this project, I developed a GraphQL interface to the knowledge graph to provide a set of standard queries that can support a simple web interface to the graph. This provides a way to explore the graph as it is being developed, which in turn can highlight gaps in connectivity and coverage that need to be addressed.

Read full abstract

Triple Store Research Articles

Related Topics

Articles published on Triple Store

FIPs and Practice

Performance benchmark on semantic web repositories for spatially explicit knowledge graph applications

The OpenBiodiv Knowledge Graph Rebuilt: A semantic hub on top of the ARPHA-published content and the Biodiversity Literature Repository

Bootstrapping a Biodiversity Knowledge Graph

Jumping Evaluation of Nested Regular Path Queries

Design of Knowledge Graph Retrieval System for Legal and Regulatory Framework of Multilevel Latent Semantic Indexing.

Wukong+G: Fast and Concurrent RDF Query Processing Using RDMA-Assisted GPU Graph Exploration

Toward Mapping an NGSI-LD Context Model on RDF Graph Approaches: A Comparison Study.

An Ontology and Data Converter from RDF to the i2b2 Data Model.

Context mining and graph queries on giant biomedical knowledge graphs

PreKar: A learned performance predictor for knowledge graph stores

Creating RESTful APIs over SPARQL endpoints using RAMOSE

Querying graph databases using context-free grammars

A design space for RDF data representations

Interoperability and Integration: An Updated Approach to Linked Data Publication at the Dutch Land Registry

Practical Near-Data-Processing Architecture for Large-Scale Distributed Graph Neural Network

Incremental Knowledge Extraction From IoT-Based System for Anomaly Detection in Vegetation Crops

CRAFTS: Configurable REST APIs for Triple Stores

Cloud-Based Framework for Spatio-Temporal Trajectory Data Segmentation and Query

A survey of RDF stores & SPARQL engines for querying knowledge graphs

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Triple Store Research Articles

Related Topics

Articles published on Triple Store

FIPs and Practice

Performance benchmark on semantic web repositories for spatially explicit knowledge graph applications

The OpenBiodiv Knowledge Graph Rebuilt: A semantic hub on top of the ARPHA-published content and the Biodiversity Literature Repository

Bootstrapping a Biodiversity Knowledge Graph

Jumping Evaluation of Nested Regular Path Queries

Design of Knowledge Graph Retrieval System for Legal and Regulatory Framework of Multilevel Latent Semantic Indexing.

Wukong+G: Fast and Concurrent RDF Query Processing Using RDMA-Assisted GPU Graph Exploration

Toward Mapping an NGSI-LD Context Model on RDF Graph Approaches: A Comparison Study.

An Ontology and Data Converter from RDF to the i2b2 Data Model.

Context mining and graph queries on giant biomedical knowledge graphs

PreKar: A learned performance predictor for knowledge graph stores

Creating RESTful APIs over SPARQL endpoints using RAMOSE

Querying graph databases using context-free grammars

A design space for RDF data representations

Interoperability and Integration: An Updated Approach to Linked Data Publication at the Dutch Land Registry

Practical Near-Data-Processing Architecture for Large-Scale Distributed Graph Neural Network

Incremental Knowledge Extraction From IoT-Based System for Anomaly Detection in Vegetation Crops

CRAFTS: Configurable REST APIs for Triple Stores

Cloud-Based Framework for Spatio-Temporal Trajectory Data Segmentation and Query

A survey of RDF stores &amp; SPARQL engines for querying knowledge graphs

A survey of RDF stores & SPARQL engines for querying knowledge graphs