Subjectively interesting connecting trees and forests

Florian Adriaens,Tijl De Bie,Jefrey Lijffijt

doi:10.1007/s10618-019-00627-1

Abstract

Consider a large graph or network, and a user-provided set of query vertices between which the user wishes to explore relations. For example, a researcher may want to connect research papers in a citation network, an analyst may wish to connect organized crime suspects in a communication network, or an internet user may want to organize their bookmarks given their location in the world wide web. A natural way to do this is to connect the vertices in the form of a tree structure that is present in the graph. However, in sufficiently dense graphs, most such trees will be large or somehow trivial (e.g. involving high degree vertices) and thus not insightful. Extending previous research, we define and investigate the new problem of mining subjectively interesting trees connecting a set of query vertices in a graph, i.e., trees that are highly surprising to the specific user at hand. Using information theoretic principles, we formalize the notion of interestingness of such trees mathematically, taking in account certain prior beliefs the user has specified about the graph. A remaining problem is efficiently fitting a prior belief model. We show how this can be done for a large class of prior beliefs. Given a specified prior belief model, we then propose heuristic algorithms to find the best trees efficiently. An empirical validation of our methods on a large real graphs evaluates the different heuristics and validates the interestingness of the given trees.

Highlights

Often, data presents itself in the form of a graph, be it edge- or vertexannotated or not, weighted or unweighted, directed or undirected
The particular approach presented in this paper adds a third important distinctive aspect (and in this way it distinguishes itself from Akoglu et al (2013), which is most directly related to our work—see Sect. 9): the fact that it aims to ensure that the answer to this question is subjectively interesting to the user, i.e., taking into account the prior beliefs the user holds about the graph
Zhou et al (2010) introduced the idea of simplifying weighted networks by pruning the least important edges from them. They assume the number of edges to be removed is a parameter, and the result will not always be a tree. They are not concerned with a set of query vertices or a subjective interestingness measure

Summary

Introduction

Often, data presents itself in the form of a graph, be it edge- or vertexannotated or not, weighted or unweighted, directed or undirected. It is in this regard that the notion of subjective interestingness was formalised (Silberschatz and Tuzhilin 1996) and more recently the creation of the data mining framework FORSIED that we build upon (De Bie 2011a, 2013) This framework specifies in general terms how to model prior beliefs the user has about the data. – We define the new problem of finding subjectively interesting trees and forests connecting a set of query vertices in a graph – We propose heuristics for mining the most interesting trees efficiently, both for undirected and directed graphs We achieved this by providing methods for finding connecting trees, forests, and branchings This broadens applicability to both undirected and directed graphs, as well as allowing for a possible partitioning of the query vertices The efficiency of our proposed methods is tested in more detail on a wider variety of different types of graphs, and a direct comparison with an existing related method is added (Sect. 8)

Subjectively interesting trees in graphs

Notation and terminology

A subjective interestingness measure

The information content and inferring the background distribution

Prior beliefs on the density of sets of edges

What if there is a mismatch between the stated and actual prior beliefs?

Identifying equivalent Lagrange multipliers

Discussing two prior belief types in more detail

Prior beliefs when vertices represent timed events

Prior beliefs on degree assortativity

A fast heuristic for identifying equivalent Lagrange multipliers

Algorithms for finding the most interesting trees

Proposed methods for finding arborescences

Experiments

Fitting the different background models

Testing the relative performance of the heuristics

Scalability with varying tree depth

Testing the influence of the prior belief model on the resulting trees

Subjective evaluation

Related work

Findings

10 Concluding remarks

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Data Mining and Knowledge Discovery	Publication Date: Apr 11, 2019
Citations: 11	License type: open-access

R Discovery Prime

R Discovery Prime

Subjectively interesting connecting trees and forests

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Data Mining and Knowledge Discovery

Lead the way for us

Similar Papers

Subjectively Interesting Connecting Trees
Florian Adriaens ... Jefrey Lijffijt
-
Florian Adriaens, et. al.Florian Adriaens ... Jefrey Lijffijt
01 Jan 2017
01 Jan 2017

Mathematics of Web science: structure, dynamics and incentives
Jennifer Chayes
Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences | VOL. 371
Jennifer ChayesJennifer Chayes
28 Mar 2013
Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences | VOL. 371

Density index and proximity search in large graphs
Nan Li ... Zhen Wen
-
Nan Li, et. al.Nan Li ... Zhen Wen
29 Oct 2012
29 Oct 2012

Hierarchical, Parameter-Free Community Discovery
Spiros Papadimitriou ... Christos Faloutsos
-
Spiros Papadimitriou, et. al.Spiros Papadimitriou ... Christos Faloutsos
12 May 2020
12 May 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Subjectively interesting connecting trees and forests

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Data Mining and Knowledge Discovery