Interoperable chemical structure search service

Miroslav Kratochvíl,Jiří Vondrášek,Jakub Galgonek

doi:10.1186/s13321-019-0367-2

Abstract

MotivationThe existing connections between large databases of chemicals, proteins, metabolites and assays offer valuable resources for research in fields ranging from drug design to metabolomics. Transparent search across multiple databases provides a way to efficiently utilize these resources. To simplify such searches, many databases have adopted semantic technologies that allow interoperable querying of the datasets using SPARQL query language. However, the interoperable interfaces of the chemical databases still lack the functionality of structure-driven chemical search, which is a fundamental method of data discovery in the chemical search space.ResultsWe present a SPARQL service that augments existing semantic services by making interoperable substructure and similarity searches in small-molecule databases possible. The service thus offers new possibilities for querying interoperable databases, and simplifies writing of heterogeneous queries that include chemical-structure search terms.AvailabilityThe service is freely available and accessible using a standard SPARQL endpoint interface. The service documentation and user-oriented demonstration interfaces that allow quick explorative querying of datasets are available at https://idsm.elixir-czech.cz.

Highlights

The vast availability of research-related data sources on the Internet has created a need for tools that can efficiently search through these data and automatically collect and associate information from multiple interoperable sources
A running instance of the interoperable structure search service is accessible at the SPARQL endpoint https://idsm.elixir-czech.cz/sparql/ endpoint/, where comprises pubchem, drugbank, chebi and chembl
We present two use cases: (1) a single-purpose chemical structure search application called ‘Sachem GUI,’ which serves as an example of using an interoperable chemical search in a web application, and (2) the general application ‘SPARQL GUI,’ which serves as a tool for constructing and running heterogeneous, federated queries that employ the service

Summary

Introduction

The vast availability of research-related data sources on the Internet has created a need for tools that can efficiently search through these data and automatically collect and associate information from multiple interoperable sources. Comprehensive information about structural relations is largely absent in chemistryoriented RDF datasets This limits opportunities to Kratochvíl et al J Cheminform (2019) 11:45 observe relationships with other linked information, such as proteins and metabolic pathways that are connected by chemically similar ligands or metabolites. To address this issue, we developed a publicly available service that augments the interoperable chemical search space by providing chemical similarity and substructure relations. We developed a publicly available service that augments the interoperable chemical search space by providing chemical similarity and substructure relations This creates new ways to query and obtain more precise, meaningful data from currently available data sources

Objectives

Methods

Results

Discussion

Conclusion