GRAPES-DD: exploiting decision diagrams for index-driven search in biological graph databases

Nicola Licheri,Rosalba Giugno,Marco Beccuti,Vincenzo Bonnici

doi:10.1186/s12859-021-04129-0

Nicola Licheri, Rosalba Giugno + Show 2 more

Open Access

https://doi.org/10.1186/s12859-021-04129-0

Copy DOI

Journal: BMC bioinformatics	Publication Date: Apr 22, 2021
Citations: 2	License type: open-access

Affiliation: University of Turin, University of Verona

Abstract

BackgroundGraphs are mathematical structures widely used for expressing relationships among elements when representing biomedical and biological information. On top of these representations, several analyses are performed. A common task is the search of one substructure within one graph, called target. The problem is referred to as one-to-one subgraph search, and it is known to be NP-complete. Heuristics and indexing techniques can be applied to facilitate the search. Indexing techniques are also exploited in the context of searching in a collection of target graphs, referred to as one-to-many subgraph problem. Filter-and-verification methods that use indexing approaches provide a fast pruning of target graphs or parts of them that do not contain the query. The expensive verification phase is then performed only on the subset of promising targets. Indexing strategies extract graph features at a sufficient granularity level for performing a powerful filtering step. Features are memorized in data structures allowing an efficient access. Indexing size, querying time and filtering power are key points for the development of efficient subgraph searching solutions.ResultsAn existing approach, GRAPES, has been shown to have good performance in terms of speed-up for both one-to-one and one-to-many cases. However, it suffers in the size of the built index. For this reason, we propose GRAPES-DD, a modified version of GRAPES in which the indexing structure has been replaced with a Decision Diagram. Decision Diagrams are a broad class of data structures widely used to encode and manipulate functions efficiently. Experiments on biomedical structures and synthetic graphs have confirmed our expectation showing that GRAPES-DD has substantially reduced the memory utilization compared to GRAPES without worsening the searching time.ConclusionThe use of Decision Diagrams for searching in biochemical and biological graphs is completely new and potentially promising thanks to their ability to encode compactly sets by exploiting their structure and regularity, and to manipulate entire sets of elements at once, instead of exploring each single element explicitly. Search strategies based on Decision Diagram makes the indexing for biochemical graphs, and not only, more affordable allowing us to potentially deal with huge and ever growing collections of biochemical and biological structures.

Highlights

Graphs are a well-known mathematical structure used to encode relationships among elements of a set
Biochemical structures The collection of biochemical graphs was initially used for evaluating the performance of one-to-one subgraph isomorphism algorithms [60], and, nowadays, it is a well-established benchmark for graph theory problems linked to the subgraph isomorphism [61]
In this work we investigated the possibility to improve the performance of the cutting-edge algorithms for searching substructures in graphs based on indexing, by addressing one of their disadvantages which is the size of the index

Summary

Introduction

Graphs are a well-known mathematical structure used to encode relationships among elements of a set. Alignment can exploit the search of small subgraphs, called seeds, within the set of networks that have to be aligned, in order to reduce the computational time requirements [15] Other alignment tools, such as RINQ [16], use indexing schemes. Indexing techniques are exploited in the context of searching in a collection of target graphs, referred to as one-to-many subgraph problem. Path‐based graph indexing Graph indexing strategies based on labelled paths consist in extracting all the paths in the graphs up to a given length (number of nodes which they are composed) and compactly storing them into a data structure [27, 28, 49, 50] These techniques show good performance in terms of filtering power and construction/querying time. By comparing the ordered sequence of labels and the count of the occurrences, GRAPES effectively filters out target graphs which do not contain the query graph

Methods

Results

Discussion

Conclusion