Predicting Links and Link Change in Friends Networks: Supervised Time Series Learning with Imbalanced Data

William Hsu,Tim Weninger,Martin Paradesi

doi:10.1115/1.802823.paper68

Abstract

We address the problem of predicting links and link change in friends networks and introduce a new supervised learning method for both types of prediction. This extends previous based on directed graph features such as the indegree of candidate friends and pair dependent relational features such as common interests. In this new work, we consider how differential user data, such as that produced using regular crawls from a social network site, can be used to produce a time series with which we can identify prediction problems over both links and link change. A key issue we address is the rarity of change between two successive versions of a social network, resulting in severe imbalance between positive and negative examples of change. We compare existing approaches towards coping with this problem, present positive results on new crawls of LiveJournal, and consider how temporal data can enhance the relational link mining process. INTRODUCTION The problem of predicting links between entities such as users and communities in a friends network can be treated as one of supervised inductive learning for classification. In previous work (Hsu et al. 2007), we introduced a system for classifying pairs of users who were known to lie within a radius of 2 of one another as friends or friends of friends. This classification task was defined on a data sets consisting of 1000 and 4000 users from the blog service LiveJournal. Analysis of the graph structure and pair-dependent sets (e.g., mutual friends and common interests) produced a set of 12 features for each candidate pair. From this set of features, an effective predictor for link existence could be learned. However, there were two key limitations to this approach. First, the features made available to machine learning algorithms included certain information that is not always available for prediction tasks. For example, in many social networks such as Facebook and LinkedIn, a user who has been added to the friend set of another is prompted for whether to issue a reciprocal link, whereas realistic prediction may require link existence to be identified before such information is known. Thus, some latency is inherent in realistic prediction task specifications. Second, a key unaddressed problem in this and other related work is that treating link existence as a function of a single snapshot of the friends network fails to take into account the full history of the graph. We show in this paper that data about changes to the link structure over time can provide an effective

Full Text