Abstract

Classification based on social dimensions is commonly used to handle the multi-label classification task in heterogeneous networks. However, traditional methods, which mostly rely on the community detection algorithms to extract the latent social dimensions, produce unsatisfactory performance when community detection algorithms fail. In this paper, we propose a novel behavior based social dimensions extraction method to improve the classification performance in multi-label heterogeneous networks. In our method, nodes’ behavior features, instead of community memberships, are used to extract social dimensions. By introducing Latent Dirichlet Allocation (LDA) to model the network generation process, nodes’ connection behaviors with different communities can be extracted accurately, which are applied as latent social dimensions for classification. Experiments on various public datasets reveal that the proposed method can obtain satisfactory classification results in comparison to other state-of-the-art methods on smaller social dimensions.

Highlights

  • Due to the wide applications in fraud detection [1], gene family prediction [2] and counterterrorism analysis [3], the research of within-network classification has been very active in recent years

  • We demonstrate that when using Latent Dirichlet Allocation (LDA) to model the network, behavior feature based classifier (BBSD) performs better than community detection based classifier (LDA-CD) in default parameters

  • By extracting latent social dimensions based on network connectivity information, SocioDim framework can deal with the multi-label classification task in heterogeneous networks effectively

Read more

Summary

Introduction

Due to the wide applications in fraud detection [1], gene family prediction [2] and counterterrorism analysis [3], the research of within-network classification has been very active in recent years. Given a partially labeled network, in which labels of some nodes are known, within-network classification aims to predict the labels of rest nodes. As nodes in network are interconnected, relational classification methods can make use of the connectivity information to predict unknown nodes. The labels of neighbor nodes are of high correlation, so unknown nodes can be predicted via a weighted average of the estimated class membership of the node’s neighbors [4, 5]. Topology structure can provide valuable information for classification, so similarity measures (such as random walk, common neighbors, etc.) are used to predict unknown nodes by estimating the structure similarity with labeled nodes [6,7,8]. By exploiting network connectivity information, all above methods are shown to have satisfactory performance on single label classification task, which assumes node is only associated with one label

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call