Abstract

Most users on social media have intrinsic characteristics, such as interests and political views, that can be exploited to identify and track them, thus raising privacy and identity concerns in online communities. In this article, we investigate the problem of user identity linkage on two behavior datasets collected from different experiments. Specifically, we focus on user linkage based on users’ interaction behaviors with respect to content topics. We propose an embedding method to model a topic as a vector in a latent space to interpret its deep semantics. Then a user is modeled as a vector based on his or her interactions with topics. The embedding representations of topics are learned by optimizing the joint-objective: the compatibility between topics with similar semantics, the discriminative abilities of topics to distinguish identities, and the consistency of the same user’s characteristics from two datasets. The effectiveness of our method is verified on real-life datasets and the results show that it outperforms related methods. We also analyze failure cases in the application of our identity linkage method. Our analysis shows that factors such as the visibility and variance of user behaviors and users’ group psychology can result in mis-linkages. We also analyze the details of the behaviors of some representative users to understand the essential reasons for their identity being mis-linked. We find that these users have high variance level in their behaviors. According to the above experimental results, we introduce a confidence score into identity linkage to provide information about the accuracy of the method results.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call