Outlier edge detection using random graph generation models and applications

Honglei Zhang,Moncef Gabbouj,Serkan Kiranyaz

doi:10.1186/s40537-017-0073-8

Abstract

Outliers are samples that are generated by different mechanisms from other normal data samples. Graphs, in particular social network graphs, may contain nodes and edges that are made by scammers, malicious programs or mistakenly by normal users. Detecting outlier nodes and edges is important for data mining and graph analytics. However, previous research in the field has merely focused on detecting outlier nodes. In this article, we study the properties of edges and propose effective outlier edge detection algorithm. The proposed algorithms are inspired by community structures that are very common in social networks. We found that the graph structure around an edge holds critical information for determining the authenticity of the edge. We evaluated the proposed algorithms by injecting outlier edges into some real-world graph data. Experiment results show that the proposed algorithms can effectively detect outlier edges. In particular, the algorithm based on the Preferential Attachment Random Graph Generation model consistently gives good performance regardless of the test graph data. More important, by analyzing the authenticity of the edges in a graph, we are able to reveal underlying structure and properties of a graph. Thus, the proposed algorithms are not limited in the area of outlier edge detection. We demonstrate three different applications that benefit from the proposed algorithms: (1) a preprocessing tool that improves the performance of graph clustering algorithms; (2) an outlier node detection algorithm; and (3) a novel noisy data clustering algorithm. These applications show the great potential of the proposed outlier edge detection techniques. They also address the importance of analyzing the edges in graph mining—a topic that has been mostly neglected by researchers.

Highlights

Graphs are an important data representation, which have been extensively used in many scientific fields such as data mining, bioinformatics, multimedia content retrieval and computer vision
We evaluate the performance of the proposed outlier edge detection algorithms
We introduce outlier edge detection algorithms based on two random graph

Summary

Background

Graphs are an important data representation, which have been extensively used in many scientific fields such as data mining, bioinformatics, multimedia content retrieval and computer vision. The edges with low authentic scores, which are called weak links in this paper, are likely to be outliers. We evaluated the outlier edge detection algorithm that is based on the authentic score using injected edges in real-world graph data. Akoglu et al detected outlier nodes using the near-cliques and stars, heavy vicinities and dominant heavy links properties of the ego-network- the induced network formed by a focal node and its direct neighbors [12] They observed that some pairs of the features of normal nodes follow a power law and defined an outlier score function that measures the deviation of a node from the normal patterns. Detection of missing edges (or link prediction) is the opposite technique of outlier edge detection These algorithms find missing edges between pairs of nodes in a graph. In practice, these similarity scores do not give satisfactory performance if one uses them to detect outlier edges

Methods

Motivation

Evaluation of the proposed algorithms

Conclusions

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Big Data	Publication Date: Apr 26, 2017
Citations: 9	License type: open-access

R Discovery Prime

R Discovery Prime

Outlier edge detection using random graph generation models and applications

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Big Data

Lead the way for us

Similar Papers

An Efficient Method for Computing Similarity Between Frequent Subgraphs
Kisung Park ... Yongkoo Han
-
Kisung Park, et. al.Kisung Park ... Yongkoo Han
01 Sep 2013
01 Sep 2013

Robust node embedding against graph structural perturbations
Zhendong Zhao ... Gang Xiong
Information Sciences | VOL. 566
Zhendong Zhao, et. al.Zhendong Zhao ... Gang Xiong
04 Mar 2021
Information Sciences | VOL. 566

A Hierarchical Synchronous Parallel Model for Wide-Area Graph Analytics
Shuhao Liu ... Aiden Carnegie
-
Shuhao Liu, et. al.Shuhao Liu ... Aiden Carnegie
01 Apr 2018
01 Apr 2018

Adversarially Regularized Graph Autoencoder for Graph Embedding
Shirui Pan ... Lina Yao
-
Shirui Pan, et. al.Shirui Pan ... Lina Yao
01 Jul 2018
01 Jul 2018

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Outlier edge detection using random graph generation models and applications

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Big Data