(Hyper)Graph Embedding and Classification via Simplicial Complexes

Alessio Martino,Alessandro Giuliani,Antonello Rizzi

doi:10.3390/a12110223

Alessio Martino, Alessandro Giuliani + Show 1 more

Open Access

https://doi.org/10.3390/a12110223

Copy DOI

Abstract

This paper investigates a novel graph embedding procedure based on simplicial complexes. Inherited from algebraic topology, simplicial complexes are collections of increasing-order simplices (e.g., points, lines, triangles, tetrahedrons) which can be interpreted as possibly meaningful substructures (i.e., information granules) on the top of which an embedding space can be built by means of symbolic histograms. In the embedding space, any Euclidean pattern recognition system can be used, possibly equipped with feature selection capabilities in order to select the most informative symbols. The selected symbols can be analysed by field-experts in order to extract further knowledge about the process to be modelled by the learning system, hence the proposed modelling strategy can be considered as a grey-box. The proposed embedding has been tested on thirty benchmark datasets for graph classification and, further, we propose two real-world applications, namely predicting proteins’ enzymatic function and solubility propensity starting from their 3D structure in order to give an example of the knowledge discovery phase which can be carried out starting from the proposed embedding strategy.

Highlights

Graphs are powerful data structures that can capture topological and semantic information from data
The aim of this paper is to investigate a novel procedure for extracting meaningful information granules thanks to simplicial complexes
Proposed in Ref. [78], the Weighted Jaccard Kernel (WJK) is an hypergraph kernel working on the top of the simplicial complexes from the underlying graphs

Summary

Introduction

Graphs are powerful data structures that can capture topological and semantic information from data. This is one of the main reasons they are commonly used for modelling several real-world systems in fields such as biology and chemistry [1,2,3,4,5,6,7,8], social networks [9], telecommunication networks [10,11]. Solving pattern recognition problems in structured domains such as graphs pose additional challenges. Many structured domains are non-metric in nature [15,16,17] and patterns lack any geometrical interpretation. An input space is said to be non-metric if pairwise dissimilarities between patterns lying in such space do not satisfy the properties of a metric (non-negativity, identity, symmetry and triangle inequality) [17,18].

Objectives

Results

Discussion

Conclusion