Abstract

Analyzing social networks has received a lot of reviews in the recent literature. Many papers have been proposed to provide new techniques for mining social networks to help further study this huge amount of data. However, to the best of our knowledge, none of them considered the semantic meaning of the nodes interests while clustering the network. In this work, we propose a new algorithm, namely GeoSim, for clustering users in any social network site into communities based on the semantic meaning of the nodes interests as well as their relationships with each other. Moreover, this paper proposes a parallel version of the GeoSim algorithm that utilizes the MapReduce model to run on multiple machines simultaneously and get faster results. The two versions of the algorithm (centralized and parallel) are examined thoroughly to test their performance. The experiments show that both versions of the GeoSim algorithm achieve high community detection accuracy and scale linearly with the size of the cluster.

Highlights

  • Social network sites, such as Facebook, have played a great role in our daily lives due to its ability to link users regardless of their geographical location (Tang et al, 2014)

  • Based on the GeoSim algorithm, we propose a parallel version of the algorithm that utilizes Hadoop, a MapReduce framework, since social networks are typically very large and a single-machine-processing is insufficient

  • The GeoSim algorithm made three communities: the green community consists of authors who have interests related to database, the red community has authors that are interested in data mining, and the black community comprises of researchers who are into artificial intelligence

Read more

Summary

Introduction

Social network sites, such as Facebook, have played a great role in our daily lives due to its ability to link users regardless of their geographical location (Tang et al, 2014). Community detection aims to divide the social network into groups These groups consist of nodes that are highly related to each other. Most of the proposed community detection algorithms cluster the network based solely on the linkage behavior between the nodes. While geodesic distance provides a relatively good measure of the relatedness, it is not adequate to fully get a sense of how much a node knows the other node. Taking the geodesic distance only to measure the relatedness between nodes A and B and A and C will not give a clear result since we have direct links in both cases. The contribution of this paper is to devise a new algorithm to detect communities in a given social network based on the geodesic locations of the nodes and the similarities between their interests (GeoSim Algorithm).

Literature Review
MapReduce
The Basic Idea
Detailed Description
Time Complexity
Input Format
Mean Nodes Selection
Distance and Similarity Calculation
Clustering Stage
Experiments
GeoSim Results
GeoSimMR results
Findings
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call