Abstract

This paper shows the feasibility of utilizing the Kernel Spectral Clustering (KSC) method for the purpose of community detection in big data networks. KSC employs a primal-dual framework to construct a model. It results in a powerful property of effectively inferring the community affiliation for out-of-sample extensions. The original large kernel matrix cannot fitinto memory. Therefore, we select a smaller subgraph that preserves the overall community structure to construct the model. It makes use of the out-of-sample extension property for community membership of the unseen nodes. We provide a novel memory- and computationally efficient model selection procedure based on angular similarity in the eigenspace. We demonstrate the effectiveness of KSC on large scale synthetic networks and real world networks like the YouTube network, a road network of California and the Livejournal network. These networks contain millions of nodes and several million edges.

Highlights

  • In the modern era, complex networks are ubiquitous

  • We show that kernel spectral clustering is applicable for community detection in big data networks

  • For the Infomap [7] and Louvain [9] community detection techniques, we evaluate the subset obtained by Fast and Unique Representative Subset (FURS) on various metrics such as computation time, clustering coefficients (CCF), coverage, variation of information (VI)

Read more

Summary

Introduction

Their omnipresence is reflected in domains like social networks, web graphs, road graphs, communication networks, biological networks and financial networks. Entropy 2013, 15 vertices in the graph and edges (E) depict the relationship between these nodes. These networks exhibit community like structure, where nodes within a community are densely connected and the connections are sparse between the communities. The major drawback of these spectral clustering methods is the construction of the large affinity matrix (N × N ), where N is the number of nodes in the network, which requires to calculate the similarity between every pair of nodes in the network. As the size of the network increases, the O(N 2 ) computation and storage of this affinity N × N matrix become infeasible

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.