Large-scale Graph Mining Research Articles

Graphs emerge naturally in many domains, such as social science, neuroscience, transportation engineering, and more. In many cases, such graphs have millions or billions of nodes and edges, and their sizes increase daily at a fast pace. How can researchers from various domains explore large graphs interactively and efficiently to find out what is ‘important’? How can multiple researchers explore a new graph dataset collectively and “help” each other with their findings? In this article, we present Perseus-Hub, a large-scale graph mining tool that computes a set of graph properties in a distributed manner, performs ensemble, multi-view anomaly detection to highlight regions that are worth investigating, and provides users with uncluttered visualization and easy interaction with complex graph statistics. Perseus-Hub uses a Spark cluster to calculate various statistics of large-scale graphs efficiently, and aggregates the results in a summary on the master node to support interactive user exploration. In Perseus-Hub, the visualized distributions of graph statistics provide preliminary analysis to understand a graph. To perform a deeper analysis, users with little prior knowledge can leverage patterns (e.g., spikes in the power-law degree distribution) marked by other users or experts. Moreover, Perseus-Hub guides users to regions of interest by highlighting anomalous nodes and helps users establish a more comprehensive understanding about the graph at hand. We demonstrate our system through the case study on real, large-scale networks.

We present a new approach to large-scale graph mining based on so-called backbone refinement classes. The method efficiently mines tree-shaped subgraph descriptors under minimum frequency and significance constraints, using classes of fragments to reduce feature set size and running times. The classes are defined in terms of fragments sharing a common backbone. The method is able to optimize structural inter-feature entropy as opposed to purely occurrence-based criteria, which is characteristic for open or closed fragment mining. We first give an intuitive explanation why backbone refinement class features lead to a set of relevant features that are suitable for classification, in particular in the area of structure-activity relationships (SARs). We then show that backbone refinement classes yield a high compression in the search space of rooted perfect binary trees. We conduct several experiments to evaluate our theoretical insights in practice: A visualization suggests low co-occurrence and high entropy of backbone refinement class features. By comparison to a class of patterns sampled from the maximal patterns previously introduced by Al Hasan et al., we find a favorable tradeoff between the structural similarity and the resources needed to compute the descriptors. Cross-validation shows that classification accuracy is similar to the complete set of trees but significantly better than that of open trees, while feature set size is reduced by >90% and >30% compared to complete tree mining and open tree mining, respectively. Furthermore, compared to open or closed pattern mining, a large part of the search space can be pruned due to an improved statistical constraint (dynamic upper bound adjustment). This is confirmed experimentally by running times reduced by more than 60% compared to ordinary (static) upper bound pruning. The application of our method to the largest datasets that have been used in correlated graph mining so far indicates robustness against the minimum frequency parameter, and a cross-validation run on this data confirms that the novel descriptors render large training sets feasible, which previously might have been intractable. A C++ implementation of the mining algorithm is available at http://www.maunz.de/libfminer-doc . Animated figures, links to datasets, and further resources are available at http://www.maunz.de/mlj-res .

Large-scale Graph Mining Research Articles

Articles published on Large-scale Graph Mining

You are your friends: Detecting malware via guilt-by-association and exempt-by-reputation

PERSEUS-HUB: Interactive and Collective Exploration of Large-Scale Graphs

Proxy Graph: Visual Quality Metrics of Big Graph Sampling.

Efficient mining for structurally diverse subgraph patterns in large molecular databases

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Large-scale Graph Mining Research Articles

Articles published on Large-scale Graph Mining

You are your friends: Detecting malware via guilt-by-association and exempt-by-reputation

PERSEUS-HUB: Interactive and Collective Exploration of Large-Scale Graphs

Proxy Graph: Visual Quality Metrics of Big Graph Sampling.

Efficient mining for structurally diverse subgraph patterns in large molecular databases