Abstract

Large-scale graph processing is one of the recently developed significant research areas relevant to big data analytics. Distributed graph analytics is useful to see the intuitive insights of node interactions from large-scale network data. Distributed graph computing is an upcoming area in graph data mining that explores crucial node relationships for a given graph dataset. In this paper, we propose a new method to discover top-k user–user communities for a weighted bipartite network by defining a weighted similarity measure. We extend the structural similarity metric, namely Otsuka–Ochiai coefficient, by adding weights of nodes and quantifies the similarity between distinct items of a user–item network. We propose a new method to mine top-k user–user communities based on the similarity of items using a weighted similarity measure. Further, two algorithms, namely TUCSGF, TUCFlink, are presented to mine top-k user–user communities in a distributed approach based on the strength of the item-to-item similarities. Moreover, we execute the TUCSGF algorithm using Apache Spark by utilizing the advantage of Spark GraphFrames to mine top-k user–user communities. Also, we implement the TUCFlink algorithm to mine top-k communities using Apache Flink by utilizing the functionalities of Flink Gelly. Further, we explore two real-world network applications online learning network, chain of hospitals network with various graph methods that are to be applied for both the applications. Furthermore, we systematically perform various experiments concerning execution time, memory consumption, and CPU usage of both TUCSGF, TUCFlink on three distinct datasets. The performance of TUCFLINK is far better than TUCSGF concerning computing time. Applying distributed graph analytics for various complex networks using distributed graph processing tools GraphX, GraphFrames and Gelly provides more intuitive insights about distinct types of node interactions in graph data mining.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call