Abstract

Billions of users interact intensively every day via Online Social Networks (OSNs) such as Facebook, Twitter, or Google+. This makes OSNs an invaluable source of information, and channel of actuation, for sectors like advertising, marketing, or politics. To get the most of OSNs, analysts need to identify influential users that can be leveraged for promoting products, distributing messages, or improving the image of companies. In this report we propose a new unsupervised method, Massive Unsupervised Outlier Detection (MUOD), based on outliers detection, for providing support in the identification of influential users. MUOD is scalable, and can hence be used in large OSNs. Moreover, it labels the outliers as of shape, magnitude, or amplitude, depending of their features. This allows classifying the outlier users in multiple different classes, which are likely to include different types of influential users. Applying MUOD to a subset of roughly 400 million Google+ users, it has allowed identifying and discriminating automatically sets of outlier users, which present features associated to different definitions of influential users, like capacity to attract engagement, capacity to attract a large number of followers, or high infection capacity.

Highlights

  • We have introduced Massive Unsupervised Outlier Detection (MUOD), a novel unsupervised outlier detection algorithm based on FDA theory

  • MUOD outperforms other FDA-based outlier detection algorithms while offering a high scalability that allows to apply it in large scale multivariable datasets

  • We have tested the practical utility of MUOD in a specific problem, the detection of influencers in Online Social Networks (OSNs)

Read more

Summary

Users in Online Social Networks

Proposed methods for outlier detection in the area of functional data analysis ( FDA)[12] could be applied to this problem as a form to identify different classes of outliers, which are likely to meet the requirements of different influential user’s definitions Their outlier detection performances are poor with respect to MUOD, or their computational efficiency does not allow applying them to current OSNs (billions of users, each, characterized by tens of variables). We leverage this dataset to validate the performance of the proposed method in pre-filtering users in different outlier classes, which likely include users meeting the criteria of different definition of influence within Google+ To this end, we consider n = 21 parameters for each user, covering connectivity, activity, and user profile information. The qualitative difference between these users is very small, and the reason for not being considered outliers by FBPLOT is rather weak

Conclusions
Author Contributions
Additional Information
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call