Big Graph Analytics Platforms
Due to the growing need to process large graph and network datasetscreated by modern applications, recent years have witnessed a surginginterest in developing big graph platforms. Tens of such big graphsystems have already been developed, but there lacks a systematic categorizationand comparison of these systems. This article provides atimely and comprehensive survey of existing big graph systems, andsummarizes their key ideas and technical contributions from variousaspects. In addition to the popular vertex-centric systems which espousea think-like-a-vertex paradigm for developing parallel graph applications,this survey also covers other programming and computationmodels, contrasts those against each other, and provides a vision forthe future research on big graph analytics platforms. This survey aimsto help readers get a systematic picture of the landscape of recent biggraph systems, focusing not just on the systems themselves, but alsoon the key innovations and design philosophies underlying them.
- Book Chapter
- 10.1007/978-3-030-20485-3_2
- Jan 1, 2019
With the growth of the inter-connectivity of the world, Big Graph has become a popular emerging technology. For instance, social media (Facebook, Twitter). Prominent examples of Big Graph include social networks, biological network, graph mining, big knowledge graph, big web graphs and scholarly citation networks. A Big Graph consists of millions of nodes and trillion of edges. Big Graphs are growing exponentially and requires large computing machinery. Big Graph is posing many issues such as storage, scalability, processing and many more. This paper gives a brief overview of in-memory Big Graph Systems and some key challenges. Also, sheds some light on future research agendas of in-memory systems.
- Research Article
18
- 10.1016/j.jnca.2016.05.008
- May 13, 2016
- Journal of Network and Computer Applications
MIRACLE: A multiple independent random walks community parallel detection algorithm for big graphs
- Conference Article
8
- 10.1109/cluster.2017.51
- Sep 1, 2017
It is common for real-world applications to analyze big graphs using distributed graph processing systems. Popular in-memory systems require an enormous amount of resources to handle big graphs. While several out-of-core approaches have been proposed for processing big graphs on disk, the high disk I/O overhead could significantly reduce performance. In this paper, we propose GraphH to enable high-performance big graph analytics in small clusters. Specifically, we design a two-stage graph partition scheme to evenly divide the input graph into partitions, and propose a GAB (Gather-Apply-Broadcast) computation model to make each worker process a partition in memory at a time. We use an edge cache mechanism to reduce the disk I/O overhead, and design a hybrid strategy to improve the communication performance. GraphH can efficiently process big graphs in small clusters or even a single commodity server. Extensive evaluations have shown that GraphH could be up to 7.8x faster compared to popular in-memory systems, such as Pregel+ and PowerGraph when processing generic graphs, and more than 100x faster than recently proposed out-of-core systems, such as GraphD and Chaos when processing big graphs.
- Conference Article
24
- 10.1109/bigdata.2014.7004471
- Oct 1, 2014
Big data machine learning and graph analytics have been widely used in industry, academia and government. Continuous advance in this area is critical to business success, scientific discovery, as well as cybersecurity. In this paper, we present some current projects and propose that next-generation computing systems for big data machine learning and graph analytics need innovative designs in both hardware and software that provide a good match between big data algorithms and the underlying computing and storage resources.
- Research Article
14
- 10.1007/s10619-019-07256-z
- Feb 6, 2019
- Distributed and Parallel Databases
Relations among data entities in most big data sets can be modeled by a big graph. Implementation and execution of algorithms related to the structure of big graphs is very important in different fields. Because of the inherently high volume of big graphs, their calculations should be performed in a distributed manner. Some distributed systems based on vertex-centric model have been introduced for big graph calculations in recent years. The performance of these systems in terms of run time depends on the partitioning and distribution of the graph. Therefore, the graph partitioning is a major concern in this field. This paper concentrates on big graph partitioning approaches for distribution of graphs in vertex-centric systems. This briefly discusses vertex-centric systems and formulates different models of graph partitioning problem. Then, a review of recent methods of big graph partitioning for these systems is shown. Most recent methods of big graph partitioning for vertex centric systems can be categorized into three classes: (i) stream-based methods that see vertices or edges of the graph in a stream and partition them, (ii) distributed methods that partition vertices or edges in a distributed manner, and (iii) dynamic methods that change partitions during the execution of algorithms to obtain better performance. This study compares the properties of different approaches in each class and briefly reviews methods that are not in these categories. This comparison indicates that The streaming methods are good choices for initial load of the graph in Vertex-centric systems. The distributed and dynamic methods are appropriate for long-running applications.
- Conference Article
29
- 10.1145/2882903.2912566
- Jun 26, 2016
In recent years we have witnessed a surging interest in developing Big Graph processing systems. To date, tens of Big Graph systems have been proposed. This tutorial provides a timely and comprehensive review of existing Big Graph systems, and summarizes their pros and cons from various perspectives. We start from the existing vertex-centric systems, which which a programmer thinks intuitively like a vertex when developing parallel graph algorithms. We then introduce systems that adopt other computation paradigms and execution settings. The topics covered in this tutorial include programming models and algorithm design, computation models, communication mechanisms, out-of-core support, fault tolerance, dynamic graph support, and so on. We also highlight future research opportunities on Big Graph analytics.
- Research Article
261
- 10.1007/s00778-019-00556-x
- Jul 20, 2019
- The VLDB Journal
With the rapid development of information technologies, various big graphs are prevalent in many real applications (e.g., social media and knowledge bases). An important component of these graphs is the network community. Essentially, a community is a group of vertices which are densely connected internally. Community retrieval can be used in many real applications, such as event organization, friend recommendation, and so on. Consequently, how to efficiently find high-quality communities from big graphs is an important research topic in the era of big data. Recently, a large group of research works, called community search, have been proposed. They aim to provide efficient solutions for searching high-quality communities from large networks in real time. Nevertheless, these works focus on different types of graphs and formulate communities in different manners, and thus, it is desirable to have a comprehensive review of these works. In this survey, we conduct a thorough review of existing community search works. Moreover, we analyze and compare the quality of communities under their models, and the performance of different solutions. Furthermore, we point out new research directions. This survey does not only help researchers to have better understanding of existing community search solutions, but also provides practitioners a better judgment on choosing the proper solutions.
- Research Article
2
- 10.1007/s10115-019-01328-3
- Jan 22, 2019
- Knowledge and Information Systems
Relations among data items can be modeled with graphs in most of big data sets such as social networks’ data. This modeling creates big graphs with many vertices and edges. Balanced k-way graph partitioning is a common problem with big graphs. It has many applications in several fields. There are many approximate solutions for this problem; however, most of them do not have enough scalability for big graph partitioning and cannot be executed in a distributed manner. Vertex-centric model has been introduced recently as a scalable distributed processing method for big graphs. There are a few methods for graph partitioning based on this model. Existing approaches only consider one-step neighbors of vertices for graph partitioning and do not consider neighbors with higher steps. In this paper, a distributed method is introduced based on vertex-centric model for balanced k-way graph partitioning. This method applies the personalized PageRank vectors of vertices and partitions to decide how vertices are joined partitions. This method has been implemented in the Giraph system. The proposed method has been evaluated with several synthetic and real graphs. Experimental results have shown that this method has scalability for partitioning big graphs. It was also found that this method produces partitions with higher quality compared to the state-of-the-art stream-based methods and distributed methods based on vertex-centric programming model. Its result is close to the results of Metis method.
- Book Chapter
- 10.1007/978-3-030-26072-9_18
- Jan 1, 2019
As one of the most fundamental operations in graph analytics, community detection is to find groups of vertices that are more densely connected internally than with the rest of the graph. However, the detection of densely overlapped communities in big graphs is extremely challenging due to high time complexity. In this paper, we propose an effective and efficient graph algorithm called Cider to detect densely overlapped communities in big graphs. The intuition behind our algorithm is to exploit inherent properties of densely overlapped communities, and expand the community by minimizing its conductance. To make Cider more efficient, we extend the algorithm to expand the community more quickly by merging vertices in batches. We explicitly derive the time complexity of our algorithm and conclude that it can be implemented in near-linear time. Besides, we also implement a parallelized version of Cider to further improve its performance. Experimental results on real datasets show that our algorithms outperform existing approaches in terms of Flake Out Degree Fraction (FODF) and \(F_{1} Score\).
- Conference Article
6
- 10.1109/bigdata.2016.7840993
- Dec 1, 2016
We introduce GraphFlow, a big graph framework that is able to encode complex data science experiments as a set of high-level workflows. GraphFlow combines the Spark big data processing platform and the Galaxy workflow management system to offer a set of components for graph processing using a novel interaction model for creating and using complex workflows. GraphFlow contributes an easy-to-use interface and scalable algorithms for big graph analytics. We demonstrate GraphFlow use in large social network analysis with several case studies.
- Research Article
14
- 10.1016/j.cag.2020.02.004
- Feb 19, 2020
- Computers & Graphics
Spectrum-preserving sparsification for visualization of big graphs
- Book Chapter
- 10.1049/pbpc048e_ch12
- Sep 22, 2022
The analysis and research of data which can be altered into a comprehensive graph is referred to as "graph analytics." Graph-based data analytics is a budding field in both data mining and data visualization and is applied for a wide variety of applications such as network protection, banking, and healthcare, both multi-disciplinary and high impact applications [5]. Despite the fact that many methods have been developed in the past to analyze unstructured collections of multidimensional objects, graph analytic technologies are a recent trend that poses several challenges, not only in terms of the output of algorithms that are related to data mining that facilitate algorithmic computational data discovery [3]. Graph analytics primarily aimed to evaluate graph oriented structured data in order to uncover answers to queries (e.g. Identify the person who is the most prominent person in a community? What are the main technology nodes for better practice and decision-making on the internet and urban networks?)Analysis of graphs has always attracted and has always been an important topic for researchers in the history of computing; however, the rise of the uses of advanced analytics for large amounts of semi-structured or unstructured data and the revolution of big data has lately picked up the interest of the information systems community [1]. The qualitative effect of data, as well as the impact of graph analytics technology on organizations, has affected the requirements for business outcomes. Graph analytics for big data can not only recognize but also visualize crucial insights in big data. Furthermore, graph analytics may assist in identifying associations between different types of data and determining their existence and meaning within the context [2].In this chapter, we will present the fundamentals of graph analytics and how graphs are related to big data. The chapter will also show some of the most common graph databases and discuss various big data graph analytics approaches which use the massive datasets, as well as different frameworks for each approach. In the latter part of the chapter, various issues and challenges related to big graph analytics will be addressed. A case study for implementation of graph analytics using python will also be discussed.
- Research Article
- 10.22667/jowua.2019.12.31.109
- Jan 17, 2020
- Journal of Wireless Mobile Networks, Ubiquitous Computing, and Dependable Applications
International audience
- Research Article
1
- 10.1093/comjnl/bxac161
- Nov 21, 2022
- The Computer Journal
Pattern matching in big graphs is important for different modern applications. Recently, this problem was defined in terms of multiple extensions of graph simulation, to reduce complexity and capture more meaningful results. These results were achieved through the relaxation of commonly used constraint in subgraph isomorphism pattern matching. Nevertheless, these graph simulation variant models are still too strict to provide results in many cases, especially when analyzed graphs contain anomalies and incomplete information. To deal with this issue, we introduce a new graph pattern matching (GPM) method, called partial simulation, capable of retrieving matches despite missing parts of the pattern graph, such as vertices and/or edges. Furthermore, considering the number and inequality of the outputs, we define a relevance function to compute a value expressing how each match vertex respects the pattern graph. Similarly, we define partial dual simulation GPM that returns vertices that satisfy a part of the dual simulation constraints and assigns a relevance value to them. Additionally, we provide distributed scalable algorithms to evaluate the proposed partial simulation methods based on the distributed vertex-centric programming paradigm. Finally, our experiments on real-world data graphs demonstrate the effectiveness of the proposed models and the efficiency of their associated algorithms.
- Conference Article
10
- 10.1109/bigdatacongress.2016.12
- Jun 1, 2016
Recently, several cluster computing frameworks have been proposed for scalable and efficient processing of big graphs. The manner in which graph data is partitioned and placed on the compute nodes has a significant impact on cluster performance. While most existing graph partitioning and placement strategies have been designed for static graphs, the graphs in many modern applications are dynamic (time-evolving). In this paper, we propose a unique, continuous and multi-cost sensitive approach for partitioning dynamic graphs. Our approach incorporates novel cost functions that take into account major factors that impact the performance of big graph processing clusters. We also present incremental algorithms to efficaciously handle various types of graph dynamics. Our algorithms are unique in that they work by locally adjusting the partitions thus avoiding massive repartitioning. This paper reports a series of experiments to demonstrate the effectiveness of the proposed algorithms in maximizing the performance of big graph processing systems on dynamic graphs.
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.