Abstract

Vast amount of multimedia data contains massive and multifarious social information which is used to construct large-scale social networks. In a complex social network, a character should be ideally denoted by one and only one vertex. However, it is pervasive that a character is denoted by two or more vertices with different names; thus it is usually considered as multiple, different characters. This problem causes incorrectness of results in network analysis and mining. The factual challenge is that character uniqueness is hard to correctly confirm due to lots of complicated factors, for example, name changing and anonymization, leading to character duplication. Early, limited research has shown that previous methods depended overly upon supplementary attribute information from databases. In this paper, we propose a novel method to merge the character vertices which refer to the same entity but are denoted with different names. With this method, we firstly build the relationship network among characters based on records of social activities participating, which are extracted from multimedia sources. Then we define temporal activity paths (TAPs) for each character over time. After that, we measure similarity of the TAPs for any two characters. If the similarity is high enough, the two vertices should be considered as the same character. Based on TAPs, we can determine whether to merge the two character vertices. Our experiments showed that this solution can accurately confirm character uniqueness in large-scale social network.

Highlights

  • In the past decade, the mobile Internet and social multimedia applications have become an indispensable part of social life, and huge multimedia data are being produced and consumed [1]

  • A large-scale social network is based on diversified multimedia data which is multimodality [31]; for instance, an image can be described by color modality or shape modality

  • We proposed uniqueness measurement of characters: after SimTAP(Vx, Vy) calculating, we set character uniqueness threshold θ based on features of networks and data-analytic requirements to screen out the results

Read more

Summary

Introduction

The mobile Internet and social multimedia applications have become an indispensable part of social life, and huge multimedia data are being produced and consumed [1]. We extract information and construct social transaction databases from vast amount of multimedia data, such as text, images [4, 5], videos, and audios [6], to construct largescale social networks which are modelled by graphs [7] with node-edge representation [8]. Characters are marked up as difference vertices by former and present name These vertices have the same personal information, structure, and attributes of relation. For social networks, these vertices and relationships are redundant, which will severally perturb the results of social network analysis. We regard temporal attributes as a key factor of relations, which is used in computing the similarity of vertices It boosts the accuracy of uniqueness conforming.

Related Work
Social Network Modeling
Evaluating Uniqueness of Character Vertices Based on Structure Error
Character Uniqueness Measure Based on Activity Path Similarity
Figure 4
Conclusion and Future Work
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call