An Enhanced Filtering-Based Information Granulation Procedure for Graph Embedding and Classification

Alessio Martino,Antonello Rizzi

doi:10.1109/access.2021.3053085

Alessio Martino, Antonello Rizzi

Open Access

https://doi.org/10.1109/access.2021.3053085

Copy DOI

Abstract

Granular Computing is a powerful information processing paradigm for synthesizing advanced pattern recognition systems in non-conventional domains. In this article, a novel procedure for the automatic synthesis of suitable information granules is proposed. The procedure leverages a joint sensitivity-vs-specificity score that accounts the meaningfulness of candidate information granules for each class considered in the classification problem at hand. Only statistically relevant granules are retained for a graph embedding procedure towards a geometric space, in which standard classification systems can be used without alterations. Performance tests have been carried out by considering open access datasets of fully labelled graphs with arbitrarily complex nodes and/or edges attributes that, by definition, must rely on inexact graph matching procedures to quantify dissimilarities. Two variants of the procedure are investigated: a standard variant, which aims at automatically finding suitable information granules for solving the classification problem as a whole, and a class-specific metric learning variant, in which the optimization procedure is performed in a class-aware fashion. In the latter case, each class will have its own set of information granules, along with the corresponding parameters defining distinct instances of the dissimilarity measure. Computational results show that the proposed algorithm is able to outperform the vast majority of current approaches for graph classification, while at the same time returning a grey-box model, interpretable by field-experts.

Highlights

Graph embedding is one of the mainstream approaches when dealing with pattern recognition problems in the graph domain
This article features two appendices: in Appendix A we describe in detail the inexact graph matching procedure between labelled graphs, whereas in Appendix B we describe in detail the dissimilarities between nodes and edges for all considered datasets
The filtering operation relies on a unified index called INDVAL which accounts both specificity and sensitivity of substructures stochastically drawn from the training data with respect to the problem-related classes, with the final goal of electing as information granules only substructures endowing the highest discriminative power

Summary

INTRODUCTION

Graph embedding is one of the mainstream approaches when dealing with pattern recognition problems in the graph domain. In this work we extend the proposed methodology to fully labelled graphs (i.e., graphs with arbitrarily complex attributes on both nodes and edges), where an inexact matching procedure is mandatory More in detail, such statistical index aims at letting statistically relevant information granules emerge from the training data. We present two variants of the proposed system: a standard variant, which seeks at automatically tuning suitable global parameters using an evolutive metaheuristic and a class-specific metric learning variant, where a swarm-based evolutionary optimization aims at finding suitable systems parameters in a class-aware fashion, thereby exploiting the ground-truth class information Both variants are equipped with feature selection capabilities in order to return a (possibly) small, yet informative, set of information granules. This article features two appendices: in Appendix A we describe in detail the inexact graph matching procedure between labelled graphs, whereas in Appendix B we describe in detail the dissimilarities between nodes and edges for all considered datasets

PROPOSED METHODOLOGY

Findings

CONCLUSION