Detecting Group Anomalies in Tera-Scale Multi-Aspect Data via Dense-Subtensor Mining.

Kijung Shin,Bryan Hooi,Jisu Kim,Christos Faloutsos

doi:10.3389/fdata.2020.594302

Abstract

How can we detect fraudulent lockstep behavior in large-scale multi-aspect data (i.e., tensors)? Can we detect it when data are too large to fit in memory or even on a disk? Past studies have shown that dense subtensors in real-world tensors (e.g., social media, Wikipedia, TCP dumps, etc.) signal anomalous or fraudulent behavior such as retweet boosting, bot activities, and network attacks. Thus, various approaches, including tensor decomposition and search, have been proposed for detecting dense subtensors rapidly and accurately. However, existing methods suffer from low accuracy, or they assume that tensors are small enough to fit in main memory, which is unrealistic in many real-world applications such as social media and web. To overcome these limitations, we propose D-Cube, a disk-based dense-subtensor detection method, which also can run in a distributed manner across multiple machines. Compared to state-of-the-art methods, D-Cube is (1) Memory Efficient: requires up to 1,561× less memory and handles 1,000× larger data (2.6TB), (2) Fast: up to 7× faster due to its near-linear scalability, (3) Provably Accurate: gives a guarantee on the densities of the detected subtensors, and (4) Effective: spotted network attacks from TCP dumps and synchronized behavior in rating data most accurately.

Highlights

Given a tensor that is too large to fit in memory, how can we detect dense subtensors? Especially, can we spot dense subtensors without sacrificing speed and accuracy provided by in-memory algorithms?A common application of this problem is review fraud detection, where we aim to spot suspicious lockstep behavior among groups of fraudulent user accounts who review suspiciously similar sets of products
3.2.2 Accuracy in Dense-Subtensor Detection We show that D-CUBE gives the same accuracy guarantee with inmemory algorithms proposed in Shin et al (2018), if we set θ to 1, TABLE 3 | Summary of real-world datasets
Effectiveness in Anomaly Detection: Which anomalies does D-CUBE detect in real-world tensors? Q5

Summary

Introduction

Given a tensor that is too large to fit in memory, how can we detect dense subtensors? Especially, can we spot dense subtensors without sacrificing speed and accuracy provided by in-memory algorithms?A common application of this problem is review fraud detection, where we aim to spot suspicious lockstep behavior among groups of fraudulent user accounts who review suspiciously similar sets of products. Given a tensor that is too large to fit in memory, how can we detect dense subtensors? Can we spot dense subtensors without sacrificing speed and accuracy provided by in-memory algorithms?. A common application of this problem is review fraud detection, where we aim to spot suspicious lockstep behavior among groups of fraudulent user accounts who review suspiciously similar sets of products. Tensors allow us to consider additional dimensions in order to identify suspicious behavior of interest more accurately and . Extraordinarily dense subtensors indicate groups of users with lockstep behaviors both in the products they review and along the additional dimensions (e.g., multiple users reviewing the same products at the exact same time). In addition to review-fraud detection, spotting dense subtensors has been found effective for many anomaly-detection tasks.

Objectives

Methods

Results

Conclusion