Membership Detection Using Cooperative Data Mining Algorithms

Calvin Newport,Lisa Singh,Yiqing Ren

doi:10.1137/1.9781611973440.87

Abstract

More and more companies are providing data mining and analytics solutions to customers using social media data. The general approach taken by these companies is to continually collect data from social media sites and then use the collected snapshot of the content for a data mining or analytics task. Unfortunately, given the exponential increase in the volume of social media data, building local database snapshots and running computationally expensive algorithms is not always plausible. As an alternative to the centralized approach, in this paper, we study the feasibility of cooperative algorithms where data never leaves the mined social media network, and instead the network users themselves work together, using only the communication primitives provided by the social media site, to solve data mining problems. While cooperative algorithms can be built for many different data mining tasks, to show the viability of this approach, we focus on a task fundamental to many different social mining applications - membership detection (an individual using the social media site wants to efficiently get a request to a member of a known group with unknown membership). Using Twitter as our specific social graph, we seek cooperative algorithms that solve this problem with high probability even when we assume only a small fraction of the Twitter network participates and we enforce a bound on the number of tweets generated. After validating the potential of cooperative solutions on Twitter, we empirically evaluate a collection of cooperative strategies on a snapshot of the Twitter network containing over 50 million users. Our best solution, which we call brokered token passing, can reliably and efficiently detect group membership while requiring only a small number of tweets be sent and a small percentage of users participate.

Full Text