Siamese Neural Networks for User Identity Linkage Through Web Browsing.

Yuanyuan Qiao,Fan Duo,Jie Yang,Yuewei Wu,Wenhui Lin

doi:10.1109/tnnls.2019.2929575

Abstract

Linking online identities of users among countless heterogeneous network services on the Internet can provide an explicit digital representation of users, which can benefit both research and industry. In recent years, user identity linkage (UIL) through the Internet has become an emerging task with great potential and many challenges. Existing works mainly focus on online social networks that consider inconsistent profiles, content, and networks as features or use sparse location-based data sets to link the online behaviors of a real person. To extend the UIL problem to a general scenario, we try to link the web-browsing behaviors of users, which can help to distinguish specific users from others, such as children or malicious users. More specifically, we propose a Siamese neural network (NN) architecture-based UIL (SAUIL) model that learns and compares the highest-level feature representation of input web-browsing behaviors with deep NNs. Although the number of matching and nonmatching pairs for the UIL problem is highly imbalanced, previous studies have not considered imbalanced UIL data sets. Therefore, we further address the imbalanced learning issue by proposing cost-sensitive SAUIL (C-SAUIL) model, which assumes higher costs for misclassifying the minority class. In the experiments, the proposed model is robust and exhibits a good performance on very large, real-world data sets collected from different regions with distinct characteristics.

Full Text