A Unified Framework for User Identification Across Online and Offline Data

Tianyi Hao,Yunsheng Cheng,Jingbo Zhou,Haishan Wu,Longbo Huang

doi:10.1109/tkde.2020.3000287

Abstract

User identification across multiple datasets has a wide range of applications and there has been an increasing set of research works on this topic during recent years. However, most of existing works focus on user identification with a single input data type, e.g., (I) identifying a user across multiple social networks with online data and (II) detecting a single user from heterogeneous trajectory datasets with offline data. Different from previous works, in this paper, we propose a framework on user identification between online and offline datasets. We build connections between these two types of data by a mapping from IP addresses to physical locations. To solve this problem, we propose a novel framework consisting of three steps. First, we use a clustering method based on locations of IP addresses to map IP addresses into specific physical location distributions. Second, we propose a novel pairwise index to reduce space cost and running time for computing the co-occurrence. Lastly, we apply a learning-to-rank method to merge the effect of multiple features we get in the first two steps. Based on our framework, we design experiments to demonstrate the efficiency (in time and space) of our framework, together with the precision and recall of our approach compared to other methods.

Full Text