Published in last 50 years
Articles published on Big Graph
- Research Article
- 10.1002/cpe.70304
- Sep 28, 2025
- Concurrency and Computation: Practice and Experience
- Tengteng Cheng + 2 more
ABSTRACTAtomic subgraphs are inherent and functionally meaningful structures in real‐world graphs, capturing cohesive units such as social communities, molecular functional groups, or neural circuits. Preserving these atomic subgraphs during graph partitioning is crucial for maintaining semantic integrity, improving algorithmic interpretability, and reducing communication overhead in parallel processing. However, traditional partitioning methods often overlook this structural prior, leading to fragmentation of such subgraphs and degradation in downstream analytical quality. In this work, we propose a novel balanced graph partitioning approach that explicitly preserves atomic subgraphs through a coarsen‐partition‐refine framework. In the coarsening phase, smaller subgraphs are merged into a larger one based on the maximum edge‐to‐vertex weight ratio between subgraphs. In the partitioning phase, a spectral k‐way method divides the coarsened graph into k balanced blocks. In the refinement phase, boundary subgraphs are exchanged between target blocks via designed rules, reducing cut‐edge weights and ultimately yielding higher‐quality balanced partitions. We evaluate our method on real‐world and synthetic datasets by generating graphs with diverse subgraph distributions. The experimental results demonstrate the feasibility and effectiveness of our method.
- Research Article
- 10.14778/3746405.3746418
- May 1, 2025
- Proceedings of the VLDB Endowment
- Akhlaque Ahmad + 5 more
A k -plex is a dense subgraph structure where every vertex can be disconnected with at most k vertices. Finding a maximum k -plex (M k P) in a big graph is a key primitive in many real applications such as community detection and biological network analysis. A lot of M k P algorithms have been actively proposed in recent years in top AI and DB conferences, featuring a broad range of sophisticated pruning techniques. In this paper, we study the various pruning techniques from nine recent M k P algorithms including kPlexT, Maple, Seesaw, DiseMKP, kPlexS, KpLeX, Maplex, BnB and BS by unifying them in a common framework called V-M k P. We summarize their proposed techniques into three categories, those for (1) branching, (2) upper bounding, and (3) reduction during subgraph exploration. We find that different pruning techniques can have drastically different performance impacts, but there exists a configuration of the techniques dependent on k that leads to the best performance in vast majority of the time. Interestingly, extensive experiments with our unified framework reveal that some techniques are not effective as claimed in the original works, and we also discover an unmentioned technique that is actually the major performance booster when k > 5. We also study problem variants such as finding all the M k Ps and finding the densest M k P (i.e., with the most edges) to cover community diversity, and effective algorithm parallelization. Our source code is released at https://github.com/akhlaqueak/MKP-Study.
- Research Article
- 10.1145/3715124
- Mar 21, 2025
- ACM Transactions on Architecture and Code Optimization
- Coby Soss + 5 more
The k -dimensional Weisfeiler-Lehman ( k -WL) algorithm—developed as an efficient heuristic for testing if two graphs are isomorphic—is a fundamental kernel for node embedding in the emerging field of graph neural networks. Unfortunately, the k -WL algorithm has exponential storage requirements, limiting the size of graphs that can be handled. This work presents a novel k -WL scheme with a storage requirement orders of magnitude lower while maintaining the same accuracy as the original k -WL algorithm. Due to the reduced storage requirement, our scheme allows for processing much bigger graphs than previously possible on a single compute node. For even bigger graphs, we provide the first distributed-memory implementation. Our k -WL scheme also has significantly reduced communication volume and offers high scalability. Our experimental results demonstrate that our approach is significantly faster and has superior scalability compared to five other implementations employing state-of-the-art techniques.
- Research Article
- 10.63341/vitce/1.2025.42
- Mar 20, 2025
- Information Technology and Computer Engineering
- Andrii Banyk + 1 more
The purpose of the study was to develop approaches to the use of artificial intelligence to improve the processes of interactive visualisation of graph structures of big data in real time, considering the optimisation of computing resources. During the study, graphs were constructed for analysing relationships in big data, and computational intelligence methods were used to optimise the processing and visualisation of graphs in an interactive format. The results of the study included the development of programmes for building graph structures in Python in the Visual Studio Code environment and their further visualisation in Unity using C# in Visual Studio. First, a visualisation of a random Erdos-Renyi-type graph was shown, which was then recreated in Unity 3D space. Using Python libraries, graph generation and interactive web visualisation were implemented. Machine learning methods were used to optimise the location of nodes in graphs, in particular, autoencoders and principal components to reduce dimension. A demonstration of the Barbashi-Albert model allowed seeing the clustering of nodes and their relationships in real time. In addition, interactive visualisation was demonstrated, where nodes were located in 2D space according to the results of the principal components analysis. The use of the Louvain algorithm helped to perform clustering and visualise the structure of communities. The results showed that the use of neural networks significantly improves the accuracy and efficiency of node placement in graphs, and reduces computational complexity. The results obtained can be useful for scientific research involving the analysis of large graph structures and requiring interactive data visualisation
- Research Article
- 10.1007/s11192-024-05228-4
- Jan 20, 2025
- Scientometrics
- Sang Yoon Kim + 3 more
Discovering AI adoption patterns from big academic graph data
- Research Article
- 10.2478/amns-2025-0152
- Jan 1, 2025
- Applied Mathematics and Nonlinear Sciences
- Kuo Li + 4 more
Abstract The Bayesian network (BN) model, as a big data graph model that integrates causal inference and probabilistic representation, has received widespread attention and research in both academia and industry. However, with the advent of the big data era, traditional BN structure learning algorithms have encountered unprecedented challenges in processing high-dimensional data, mainly manifested as a sharp increase in computational complexity and difficulty in achieving ideal accuracy requirements within an acceptable time range, which greatly limits their breadth and depth in practical applications. In response to this bottleneck problem, this article innovatively proposes a new approach that combines width learning theory with BN, referred to as Broad Bayesian Neural Network (Broad-BNN). This model effectively reduces the dimensionality of the original high-dimensional data by introducing a feature mapping layer and gradually expanding it, while achieving non-linear transformation of information and effective feature extraction. The experimental results show that the model proposed in this paper has achieved significant performance improvement in high-dimensional data classification problems, not only accelerating training speed but also significantly improving classification accuracy, providing a new perspective and solution for solving the difficulties of high-dimensional data processing.
- Research Article
- 10.65000/jfkhmg23
- Nov 30, 2024
- International Journal of Modern Computation, Information and Communication Technology
- Chethan Chandra S Basavaraddi
The rapid expansion of large-scale datasets has highlighted the importance of scalable graph processing methods within distributed computing environments. Apache Hadoop, through its integration of the Hadoop Distributed File System (HDFS) and MapReduce, provides a foundation for handling such challenges. This study explores the incorporation of Depth-First Search (DFS) into Hadoop for efficient big data graph processing. The work outlines the design of Hadoop-compatible graph structures and a MapReduce-based DFS framework optimized for large-scale traversal. Advanced implementations, including iterative, randomized, and parallel DFS, are evaluated for their impact on execution efficiency, resource allocation, and scalability. The proposed integration enables applications in web graph analysis, computational biology, and social network exploration, while also providing a generalized foundation for adapting other graph algorithms within Hadoop. Quantitative evaluations demonstrate DFS’s ability to process large adjacency matrices, efficiently traverse graphs of up to 56–101 vertices, and highlight performance trade-offs in terms of execution time, memory handling, and scalability compared with sequential DFS, confirming the benefits of distributed parallelization in Hadoop-based environments.
- Research Article
4
- 10.3390/land13111962
- Nov 20, 2024
- Land
- Maomao Yan + 3 more
In the new era of the vigorous development of digitalization and intelligence, digital technology has widely penetrated various fields. International authoritative standardization bodies, such as the International Organization for Standardization (ISO) and the International Electrotechnical Commission (IEC), proposed a timely new standard concept called Standards Machine Applicable, Readable, and Transferable (SMART) in order to meet this development trend. Its core feature is that the standard can be machine-readable, usable, understandable, and resolvable without human labor so as to achieve the goals of standard formulation, promotion, publicity, and implementation more effectively. Simultaneously, China’s standardization industry is responding to the strategic deployment of “new quality productivity” by actively promoting the digital development of standards and establishing standard information databases, standard formulation management systems, etc., which provide data support and a platform basis for applying new technologies. Advanced technologies such as big data, artificial intelligence, blockchain, and knowledge graphs can be combined with standardization to improve the efficiency of standard development, application accuracy, and implementation effects. To align with these trends, this study focuses on analyzing the responses of national and international standards in the field of urban sustainable development to the United Nations Sustainable Development Goals (UN-SDGs). This study proposes an innovative approach involving the application of knowledge graph technology to the standardization of urban sustainable development and establishing a response correlation between the indicator library for cities’ sustainable development (ILCSD) and SDGs. It also provides additional functions, such as the intelligent extraction of cities’ sustainable characteristic evaluation indicators and aided decision analysis, which greatly enhance the practicability and efficiency of the ILCSD as a technical tool. Based on knowledge graphs, this study analyzes the different responses of important standards in the field of urban sustainable development to the 17 SDGs, accurately identifies weak trends and gaps in standards, and provides a basis for improving the standardization system of urban sustainable development. Simultaneously, by comparing national and international standards and technologies, this study promotes the mutual recognition of standards, which can help China’s urban sustainable development work align with international standards. In addition, the process of establishing and maintaining knowledge graphs facilitates the continuous adoption of new standards through which the indicator library is automatically updated. Finally, in this study, we propose several inspirations for the standardization of urban sustainable development in China, such as an optimization standard system of benchmarking SDGs and a localization application of the original SDG indicators.
- Research Article
2
- 10.1145/3676846
- Nov 19, 2024
- ACM Transactions on Architecture and Code Optimization
- Xinbiao Gan + 9 more
This article presents MST, a communication-efficient message library for fast graph traversal on exascale clusters. The key idea is to follow the multi-level network topology to perform topology-aware message aggregation, where small messages are gathered and scattered at each level of domain. To facilitate message aggregation, we equip MST with flexible buffer management including active buffer switching and dynamic buffer expansion. We implement MST on the newest-generation Tianhe supercomputer and evaluated its performance using various traversal-centric algorithms on both synthetic trillion-scale graphs and real-world big graphs. The results show that MST-based graph traversal is orders of magnitude faster than that based on Active Messages Library (AML). For the Graph500-BFS benchmark, MST-based Tianhe (with 77.2 K nodes) outperforms the Fugaku supercomputer (with 148.5 K nodes) by 18.53%, while Fugaku is ranked No. 1 in the latest Graph500-BFS ranking (June 2023). MST also greatly improves graph processing performance on other commercial large-scale computing systems at the National Supercomputing Center in Changsha (NSCC) and WuzhenLight.
- Research Article
- 10.3390/electronics13224455
- Nov 13, 2024
- Electronics
- Furong Chang + 5 more
Recently, numerous graph partitioning approaches have been proposed to distribute a big graph to machines in a cluster for distributed computing. Due to heavy communication overhead, these graph partitioning approaches always suffered from long ingress times. Also, heavy communication overhead not only limits the scalability of distributed graph-parallel computing platforms but also reduces the overall performance of clusters. In order to address this problem, this work proposed a near-data source parallel graph partitioning approach noted as NDGP. In NDGP, an edge was preferentially distributed to the machine where it was stored. We implemented NDGP over two classic graph partitioning approaches, Random and Greedy, and one most recently proposed graph partitioning approach, OLPGP, and evaluated its effectiveness. Extensive experiments conducted on real-world data sets verified the effectiveness of NDGP on reducing the communication overhead in the graph partitioning process and demonstrated that NDGP does not induce additional communication and computing workload to the graph-distributed computing that follows.
- Research Article
- 10.1002/widm.1570
- Nov 1, 2024
- WIREs Data Mining and Knowledge Discovery
- Xiao Haiyang + 4 more
ABSTRACTDissolution refers to the process in which solvent molecules and solute molecules attract and combine with each other. The extensive solubility data generated from the dissolution of various compounds under different conditions, is distributed across structured or semi‐structured formats in various media, such as text, web pages, tables, images, and databases. These data exhibit multi‐source and unstructured features, aligning with the typical 5 V characteristics of big data. A solubility big data technology system has emerged under the fusion of solubility data and big data technologies. However, the acquisition, fusion, storage, representation, and utilization of solubility big data are encountering new challenges. Knowledge Graphs, known as extensive systems for representing and applying knowledge, can effectively describe entities, concepts, and relations across diverse domains. The construction of solubility big data knowledge graph holds substantial value in the retrieval, analysis, utilization, and visualization of solubility knowledge. Throwing out a brick to attract a jade, this paper focuses on the solubility big data knowledge graph and, firstly, summarizes the architecture of solubility knowledge graph construction. Secondly, the key technologies such as knowledge extraction, knowledge fusion, and knowledge reasoning of solubility big data are emphasized, along with summarizing the common machine learning methods in knowledge graph construction. Furthermore, this paper explores application scenarios, such as knowledge question answering and recommender systems for solubility big data. Finally, it presents a prospective view of the shortcomings, challenges, and future directions related to the construction of solubility big data knowledge graph. This article proposes the research direction of solubility big data knowledge graph, which can provide technical references for constructing a solubility knowledge graph. At the same time, it serves as a comprehensive medium for describing data, resources, and their applications across diverse fields such as chemistry, materials, biology, energy, medicine, and so on. It further aids in knowledge retrieval and mining, analysis and utilization, and visualization across various disciplines.
- Research Article
- 10.14778/3712221.3712240
- Nov 1, 2024
- Proceedings of the VLDB Endowment
- Yang Liu + 4 more
This paper develops Planar (Plug and play PRAM), a single-machine system for graph analytics by reusing existing PRAM algorithms, without the need for designing new parallel algorithms. Planar supports both out-of-core and in-memory analytics. When a graph is too big to fit into the memory of a machine, Planar adapts PRAM to limited resources by extending a fixpoint model with multi-core parallelism, using disk as memory extension. For an in-memory task, it dedicates all available CPU cores to the task, and allows parallelly scalable PRAM algorithms to retain the property, i.e. , the more cores are available, the less runtime is taken. We develop a graph partitioning and work scheduling strategy to accommodate subgraph I/O, balance memory usage and reduce runtime, beyond traditional partitioners for multi-machine systems. Using real-life graphs, we empirically verify that Planar outperforms SOTA in-memory and out-of-core systems in efficiency and scalability.
- Research Article
5
- 10.1038/s41598-024-69643-6
- Aug 14, 2024
- Scientific Reports
- Igor Gaidai + 1 more
In this paper we consider the scalability of multi-angle QAOA with respect to the number of QAOA layers. We found that MA-QAOA is able to significantly reduce the depth of QAOA circuits, by a factor of up to 4 for the considered data sets. Moreover, MA-QAOA is less sensitive to system size, therefore we predict that this factor will be even larger for big graphs. However, MA-QAOA was found to be not optimal for minimization of the total QPU time. Different optimization initialization strategies are considered and compared for both QAOA and MA-QAOA. Among them, a new initialization strategy is suggested for MA-QAOA that is able to consistently and significantly outperform random initialization used in the previous studies.
- Research Article
- 10.14778/3685800.3685906
- Aug 1, 2024
- Proceedings of the VLDB Endowment
- Shuhao Liu + 2 more
We demonstrate PrismX (PRAM with SSDs as Memory eXtension), a single-machine system for graph analytics. PrismX allows users to make practical use of existing PRAM algorithms without any change. To cope with the limited DRAM capacity, it employs NVMe SSDs as memory extension. Leveraging graph preprocessing, PrismX implements a series of system optimization strategies, which automatically and transparently adapt to the runtime workload, no matter whether the computation is CPU-bound or I/O-bound. We demonstrate PrismX for its (1) ease of programming by reusing PRAM algorithms, (2) efficiency by comparing with the state-of-the-art graph systems, single-machine or multi-machine, in-memory or out-of-core; (3) parallel scalability of in-memory PRAM algorithms, reducing runtime when more CPU cores are available; and (4) applications in credit risk assessment.
- Research Article
1
- 10.1016/j.is.2024.102401
- May 6, 2024
- Information Systems
- Adnan Yazici + 1 more
BF-BigGraph: An efficient subgraph isomorphism approach using machine learning for big graph databases
- Research Article
- 10.1016/j.bdr.2024.100464
- May 1, 2024
- Big Data Research
- Aissam Aouar + 3 more
Scalable Diversified Top-k Pattern Matching in Big Graphs
- Research Article
7
- 10.1109/tcss.2022.3216587
- Feb 1, 2024
- IEEE Transactions on Computational Social Systems
- Myriam Jaouadi + 1 more
Social networks have attracted a great deal of attention and have, in fact, changed the way we produce, consume, and diffuse information. This change gave rise to the notion of social influence and today we talk about influential nodes. The process of detecting influential nodes in social networks aims to find entities that propagate information to a large portion of the network users. This process is often known as the influence maximization (IM) problem. Due to the explosive growth of social networks’ data, their structure is more complex and we talk about “big graph data.” Moreover, modern networks are dynamic and their topology or/and information is likely to change over time. Detecting influential nodes in such networks is a challenging task. Several methods have been developed in this context. However, they concentrate on static networks and there is little work on large-scale social networks. We propose in this article a new model for IM called MapReduce-based dynamic selection of influential nodes (MR-DSINs) that has the ability to cope with the huge size of real social networks. In fact, our approach is based on a graph sampling step in order to reduce the network’s size. Given that reduced version, MR-DSIN is able to select dynamically influential nodes. Our proposal has the advantage of considering the dynamics of information that can be modeled by users’ social actions (e.g.“, share”, “comment”, “retweet”). Experimental results on real-world social networks and computer-generated artificial graphs demonstrate that MR-DSIN is efficient for identifying influential nodes, compared with three known proposals. We prove that our model is able to detect in the reduced graph an influence as important as in the original one.
- Research Article
- 10.56028/aetr.9.1.257.2024
- Jan 2, 2024
- Advances in Engineering Technology Research
- Xuehe Zhuang + 2 more
Knowledge graph is the key technology of knowledge engineering in the era of big data. Using the powerful semantic understanding and knowledge organization ability of knowledge graph, it can be a better solution to the problems such as the disordered and over-wide coverage of knowledge related to modern Chinese history. The core of this paper is to use high-quality machine learning and deep learning algorithms with the support of big data knowledge graph to obtain the problem analysis result through natural language processing, and then match the problem analysis result with the question template to generate relevant query statements in the constructed knowledge graph to query relevant content through the knowledge graph rich semantic relations. The close relationship between the entities returns the most appropriate information for the user. The experimental results show that the designed question-and-answer system of modern Chinese history fills the gap in this field to a certain extent.
- Research Article
- 10.2478/amns-2024-3128
- Jan 1, 2024
- Applied Mathematics and Nonlinear Sciences
- Shaofeng Bai + 6 more
Abstract In this paper, we utilize big data to screen relevant data on charging safety influencing factors and perform data cleaning to constitute a charging safety influencing factors dataset. BERT is selected as the baseline model for the named entity recognition task, together with the CRF model, to exclude irrelevant features, resulting in an effective model for entity recognition in line with the knowledge graph. Introducing a security database, a graph attention network model that simultaneously obtains the structural features and textual description features of the security knowledge graph is proposed to improve the performance of knowledge graph relationship extraction. The dataset of high-frequency charging security composition, as well as the random dataset, are used as experimental samples, respectively, to compare and analyze the performance of the BERT-CRF named entity recognition model in terms of each index. The link prediction evaluation task is evaluated using the structure- and text-based graph attention network model, and experimental analysis is carried out using three benchmark models. From the overall results of the test, it can be seen that the BERT-CRF model learns 90% of the lexicon’s knowledge and passes the model test by keeping each evaluation metric in the range of 0.9 to 1.0 under the large data volume experimental environment. The proposed graph attention network model, which uses structure and text, has a better link prediction performance than other models and performs better in the FB15K-237 dataset.
- Research Article
- 10.1016/j.procs.2024.09.113
- Jan 1, 2024
- Procedia Computer Science
- Hao Zhu + 2 more
A Method for Constructing an Intelligent Maintenance Assistance System for Hydroelectric Stations