Abstract

Consider a large graph or network, and a user-provided set of query vertices between which the user wishes to explore relations. For example, a researcher may want to connect research papers in a citation network, an analyst may wish to connect organized crime suspects in a communication network, or an internet user may want to organize their bookmarks given their location in the world wide web. A natural way to do this is to connect the vertices in the form of a tree structure that is present in the graph. However, in sufficiently dense graphs, most such trees will be large or somehow trivial (e.g. involving high degree vertices) and thus not insightful. Extending previous research, we define and investigate the new problem of mining subjectively interesting trees connecting a set of query vertices in a graph, i.e., trees that are highly surprising to the specific user at hand. Using information theoretic principles, we formalize the notion of interestingness of such trees mathematically, taking in account certain prior beliefs the user has specified about the graph. A remaining problem is efficiently fitting a prior belief model. We show how this can be done for a large class of prior beliefs. Given a specified prior belief model, we then propose heuristic algorithms to find the best trees efficiently. An empirical validation of our methods on a large real graphs evaluates the different heuristics and validates the interestingness of the given trees.

Highlights

  • Often, data presents itself in the form of a graph, be it edge- or vertexannotated or not, weighted or unweighted, directed or undirected

  • The particular approach presented in this paper adds a third important distinctive aspect (and in this way it distinguishes itself from Akoglu et al (2013), which is most directly related to our work—see Sect. 9): the fact that it aims to ensure that the answer to this question is subjectively interesting to the user, i.e., taking into account the prior beliefs the user holds about the graph

  • Zhou et al (2010) introduced the idea of simplifying weighted networks by pruning the least important edges from them. They assume the number of edges to be removed is a parameter, and the result will not always be a tree. They are not concerned with a set of query vertices or a subjective interestingness measure

Read more

Summary

Introduction

Often, data presents itself in the form of a graph, be it edge- or vertexannotated or not, weighted or unweighted, directed or undirected. It is in this regard that the notion of subjective interestingness was formalised (Silberschatz and Tuzhilin 1996) and more recently the creation of the data mining framework FORSIED that we build upon (De Bie 2011a, 2013) This framework specifies in general terms how to model prior beliefs the user has about the data. – We define the new problem of finding subjectively interesting trees and forests connecting a set of query vertices in a graph – We propose heuristics for mining the most interesting trees efficiently, both for undirected and directed graphs We achieved this by providing methods for finding connecting trees, forests, and branchings This broadens applicability to both undirected and directed graphs, as well as allowing for a possible partitioning of the query vertices The efficiency of our proposed methods is tested in more detail on a wider variety of different types of graphs, and a direct comparison with an existing related method is added (Sect. 8)

Subjectively interesting trees in graphs
Notation and terminology
A subjective interestingness measure
The information content and inferring the background distribution
Prior beliefs on the density of sets of edges
What if there is a mismatch between the stated and actual prior beliefs?
Identifying equivalent Lagrange multipliers
Discussing two prior belief types in more detail
Prior beliefs when vertices represent timed events
Prior beliefs on degree assortativity
A fast heuristic for identifying equivalent Lagrange multipliers
Algorithms for finding the most interesting trees
Proposed methods for finding arborescences
Experiments
Fitting the different background models
Testing the relative performance of the heuristics
Scalability with varying tree depth
Testing the influence of the prior belief model on the resulting trees
Subjective evaluation
Related work
Findings
10 Concluding remarks
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.