Abstract

We plan to create a method of clustering a social network graph. For testing the method there is a need to generate a graph similar in structure to existing social networks. The article presents an algorithm for the graph distributed generation. We took into account basic properties such as power-law distribution of the users communities number, dense intersections of the social networks and others. This algorithm also considers the problems that are present in similar works of other authors, for example, the multiple edges problem in the generation process. A special feature of the created algorithm is the implementation depending on the communities number parameter rather than on the connected users number as it is done in other works. It is connected with a peculiarity of progressing the existing social network structure. There are properties of its graph in the paper. We described a table containing the variables needed for the algorithm. A step-by-step generation algorithm was compiled. Appropriate mathematical parameters were calculated for it. A generation is performed in a distributed way by Apache Spark framework. It was described in detail how the tasks division with the help of this framework runs. The Erdos-Renyi model for random graphs is used in the algorithm. It is the most suitable and easy one to implement. The main advantages of the created method are the small amount of resources in comparison with other similar generators and execution speed. Speed is achieved through distributed work and the fact that in any time network users have their own unique numbers and are ordered by these numbers, so there is no need to sort them out. The designed algorithm will promote not only the efficient clustering method creation. It can be useful in other development areas connected, for example, with the social networks search engines.

Highlights

  • We plan to create a method of clustering a social network graph

  • The article presents an algorithm for the graph distributed generation

  • We took into account basic properties

Read more

Summary

Постановка задачи

Ci является подграфом графа G, размер сообщества |Ci| = rCi. Количество сообществ равно A2. Все вместе они образуют вершинное покрытие графа. Количество вхождений j-й вершины в разные сообщества bj. Вершина j ∈ Ci имеет внутреннюю степень dijn,Cti и внешнюю степень dej,xCti. Внутренняя определяется как количество ребер, соединяющих j с другими вершинами в Ci. Внешняя степень соответственно количество ребер, соединяющих j с остальной частью графа. Общая степень вершины j равна dj = dijn,Cti + dej,xCti. Задача состоит в том, чтобы сгенерировать граф социальной сети G со следующими свойствами: Белов Ю. Генерация графа социальной сети с использованием Apache Spark. 1. Размеры сообществ графа распределены по степенному закону с экспонентой α > 0:

В графе присутствует большая компонента связности:
С большой вероятностью каждое сообщество Ci является связным графом:
Предложенный метод
Подробный алгоритм предложенного метода
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.