Queryable Compression on Time-evolving Web and Social Networks with Streaming
Time-evolving web and social network graphs are modeled as a set of pages/individuals (nodes) and their arcs (links/relationships) that change over time. Due to their popularity, they have become increasingly massive in terms of their number of nodes, arcs, and lifetimes. However, these graphs are extremely sparse throughout their lifetimes. For example, it is estimated that Facebook has over a billion vertices, yet at any point in time, it has far less than 0.001% of all possible relationships. The space required to store these large sparse graphs may not fit in most main memories using underlying representations such as a series of adjacency matrices or adjacency lists. We propose building a compressed data structure that has a compressed binary tree corresponding to each row of each adjacency matrix of the time-evolving graph. We do not explicitly construct the adjacency matrix, and our algorithms take the time-evolving arc list representation as input for its construction. Our compressed structure allows for directed and undirected graphs, faster arc and neighborhood queries, as well as the ability for arcs and frames to be added and removed directly from the compressed structure (streaming operations). We use publicly available network data sets such as Flickr, Yahoo!, and Wikipedia in our experiments and show that our new technique performs as well or better than our benchmarks on all datasets in terms of compression size and other vital metrics.
- Research Article
5
- 10.1109/access.2019.2912172
- Jan 1, 2019
- IEEE Access
With the rapid growth of the Internet, the scale of graphs has increased dramatically, which poses special challenges in representing both web graphs and social network graphs. In the adjacency matrix of web and social network graphs, only a very small proportion of the elements is “1” s. Furthermore, we find that using the aggregation of scattered 1 s to form a high density of adjacency matrices is beneficial to the compression of storage space. Based on these findings, we propose the DGC-K<sup>2</sup>-tree compression approach based on K<sup>2</sup>-tree, which can greatly increase the density of 1 s among the existing algorithms and adequately compress the blank area in the adjacency matrix. Then, we design a query algorithm for this mechanism to support the operation on the graph. The experimental results show that compared with the state-of-the-art algorithms, including the K<sup>2</sup>-tree based on a diagonal clustering mechanism (K<sup>2</sup>-BDC), the K<sup>2</sup>-tree, Re-Pair, and LZ78, our approach achieves better compression ratio and shorter time consumption. In terms of storage efficiency, our approach reduces the space by an average of 34.07% compared to the best performing algorithm K<sup>2</sup>-BDC. In terms of query efficiency, our approach reduces the time by an average of 80.63% compared to the best performing algorithm LZ78.
- Conference Article
14
- 10.1109/bigdata.2017.8258020
- Dec 1, 2017
In this era of social networks, we find ourselves with a collection of massive, changing graphs. Each of these graphs contain a set of nodes (individuals) and a set of edges among the nodes (relationships). How a graph is represented in a data structure determines what information is easy to obtain from it. However, many graphs are so large that even basic data structure representations (e.g. adjacency lists) do not fit in main memory. Therefore, it is an interesting field of study to design compressed data structures that facilitate certain query functions. Since we are dealing with social networks, our structure will also be able to stream edges directly into the compressed graph. We introduce our social network compressed data structure as an indexed array of compressed binary trees. We further minimize memory overhead by directly constructing the graph without any intermediate structure. We also provide fast access methods for edge existence (does an edge exist between two nodes?), neighbor queries (list a node's neighbors), and streaming operations (add/remove nodes/edges). We test our algorithms on public, anonymized, massive graphs such as Friendster, Live-Journal, Pokec, Twitter, and others. Our empirical evaluation is based on several parameters including time to compress, memory required by the compression algorithm, size of compressed graph, and time to execute queries.
- Research Article
- 10.11113/matematika.v41.n2.1568
- Aug 1, 2025
- MATEMATIKA
Tudung saji is a traditional food cover which is commonly used in Malaysia. In previous research, some triaxial patterns of tudung saji have been shown to be isomorphic to some groups. One of the common tudung saji pattern is called Kapal Layar (Sailboat), and this pattern is isomorphic to cyclic group of order six. Then, a new graph called tudung saji graph is introduced depending on the sections of the triaxial template of the tudung saji pattern. This graph is constructed with the set of vertices consisting of the elements of the Kapal Layar pattern in which two vertices are connected by an edge if the corresponding strands for both vertices are equal. Next, from the structure of the tudung saji graph of Kapal Layar pattern and the isomorphism of the pattern to the ring of integers modulo six, the adjacency matrix of the graph is constructed. Finally, the energy of graphs of the ring associated to Kapal Layar pattern is computed.
- Research Article
2
- 10.7463/0517.0001159
- May 3, 2017
- Science and Education of the Bauman MSTU
Online social networks (such as Facebook, Twitter, VKontakte, etc.) being an important channel for disseminating information are often used to arrange an impact on the social consciousness for various purposes - from advertising products or services to the full-scale information war thereby making them to be a very relevant object of research. The paper reviewed the analysis methods of social networks (primarily, online), based on the spectral theory of graphs. Such methods use the spectrum of the social graph, i.e. a set of eigenvalues of its adjacency matrix, and also the eigenvectors of the adjacency matrix. Described measures of centrality (in particular, centrality based on the eigenvector and PageRank), which reflect a degree of impact one or another user of the social network has. A very popular PageRank measure uses, as a measure of centrality, the graph vertices, the final probabilities of the Markov chain, whose matrix of transition probabilities is calculated on the basis of the adjacency matrix of the social graph. The vector of final probabilities is an eigenvector of the matrix of transition probabilities. Presented a method of dividing the graph vertices into two groups. It is based on maximizing the network modularity by computing the eigenvector of the modularity matrix. Considered a method for detecting bots based on the non-randomness measure of a graph to be computed using the spectral coordinates of vertices - sets of eigenvector components of the adjacency matrix of a social graph. In general, there are a number of algorithms to analyse social networks based on the spectral theory of graphs. These algorithms show very good results, but their disadvantage is the relatively high (albeit polynomial) computational complexity for large graphs. At the same time it is obvious that the practical application capacity of the spectral graph theory methods is still underestimated, and it may be used as a basis to develop new methods. The work was carried out with the support from the RFBR grant No. 16-29-09517.
- Research Article
29
- 10.1016/j.laa.2016.01.033
- Jan 29, 2016
- Linear Algebra and its Applications
On the mixed adjacency matrix of a mixed graph
- Research Article
34
- 10.1007/s00006-008-0116-5
- Jun 16, 2008
- Advances in Applied Clifford Algebras
This paper expands on the graph-theoretic content of my contributed talk at the Seventh International Conference on Clifford Algebras and Their Applications. A well-known result in graph theory states that when A is the adjacency matrix of a finite graph G, the entries of Ak represent numbers of k-step walks existing in G. However, the adjacency matrix fails to distinguish between walks and “self-avoiding” walks (i.e., walks without repeated vertices). Utilizing elements of abelian, nilpotent-generated algebras, a “new” adjacency matrix is associated with a finite graph on n vertices. By considering entries of \({\mathcal{A}}^k\), where \({\mathcal{A}}\) is an appropriate nilpotent adjacency matrix, one is able to recover the self-avoiding k-walks in any finite graph. In particular, a graph’s Hamiltonian cycles are enumerated by the top-form coefficient in the trace of \({\mathcal{A}}^n\) when n is the number of vertices in the graph. By considering the lth power of the trace of \({\mathcal{A}}^k\), l-tuples of pairwise-disjoint k-cycles are recovered. By defining a nilpotent transition matrix associated with a time-homogeneous Markov chain, a method of computing probabilities of self-avoiding random walks on finite graphs is developed. Expected hitting times of specific states in Markov chains and expected times of first self-intersection of random walks are also recovered using these methods. The algebra used to define the nilpotent adjacency matrix of a graph on n vertices is not itself a Clifford algebra, but it can be constructed within the 2n-particle fermion algebra \({\mathcal{C}}\ell_{2n,2n}\), indicating potential connections to quantum computing.
- Research Article
32
- 10.1007/s10231-016-0608-1
- Sep 2, 2016
- Annali di Matematica Pura ed Applicata (1923 -)
In this paper we continue a research project concerning the study of a graph from the perspective of granular computation. To be more specific, we interpret the adjacency matrix of any simple undirected graph G in terms of data information table, which is one of the most studied structures in database theory. Granular computing (abbreviated GrC) is a well-developed research field in applied and theoretical information sciences; nevertheless, in this paper we address our efforts toward a purely mathematical development of the link between GrC and graph theory. From this perspective, the well-studied notion of indiscernibility relation in GrC becomes a symmetry relation with respect to a given vertex subset in graph theory; therefore, the investigation of this symmetry relation turns out to be the main object of study in this paper. In detail, we study a simple undirected graph G by assuming a generic vertex subset W as reference system with respect to which examine the symmetry of all vertex subsets of G. The change of perspective from G without reference system to the pair (G, W) is similar to what occurs in the transition from an affine space to a vector space. We interpret the symmetry blocks in the reference system (G, W) as particular equivalence classes of vertices in G, and we study the geometric properties of all reference systems (G, W), when W runs over all vertex subsets of G. We also introduce three hypergraph models and a vertex set partition lattice associated to G, by taking as general models of reference several classical notions of GrC. For all these constructions, we provide a geometric characterization and we determine their structure for basic graph families. Finally, we apply a wide part of our work to study the important case of the Petersen graph.
- Book Chapter
- 10.1007/978-1-4471-6569-9_5
- Jan 1, 2014
In the first section of this chapter we give a rigorous derivation of the Perron–Frobenius Theorem, restricting our attention to the adjacency matrix of a graph. Some bounds for the Perron root of the adjacency matrix are obtained. As an application, we derive Turan’s Theorem on triangle-free graphs. The next section is devoted to a basic introduction to the adjacency algebra, which is the algebra generated by the adjacency matrix and its powers. For a regular graph, the adjacency matrix and the Laplacian differ only by a scalar matrix. This enables us to explore the relationship between the adjacency matrix of a regular graph and that of its complement and line graph. Several results in this direction are proved in the next section. In the final section we derive spectral properties of strongly regular graph and apply them to derive the well-known Friendship Theorem.
- Research Article
2
- 10.1007/s11277-017-4087-5
- Mar 15, 2017
- Wireless Personal Communications
Graph is widely used to model data in various applications. With the rapid growth of many emerging applications such as Internet of Things, it is urgent to require the processing capability on large scale graphs with billions of vertices. Web graph is a typical case of graph data that is widely used for analyzing the structure, behavior and evolution of the World Wide Web. In this paper, we focus on optimal representation of large-scale Web graphs. Our work is motivated by the need of fit large-scale graphs into the main memory and carry out analyze on them. By analyzing the adjacency matrix of Web graphs, we find two characteristics on the distribution of 1s in the matrix. Firstly, only a very small proportion of elements in the matrix are 1s. Secondly, majority of 1s gather around the principal diagonal and form a few number of clusters in the matrix. Based on these characteristics, we first develop a clustering mechanism to locate the clusters of 1s in the adjacency matrix. Then, we combine this clustering mechanism with a structure named K2-tree and propose an approach for representing large-scale Web graphs compactly. Basic idea of the approach is trying to compress a large number of zeros as a single zero. Experimental results show that, our approach not only reduces the space for representing a Web graph, but also reduces the time consumption for operations such as retrieving neighbors of any nodes on the graph; compared with existing approaches, our approach achieves the best space/time tradeoff.
- Research Article
- 10.4233/uuid:7a2dcf0b-e88e-4da6-baa9-59cfc4cc63ad
- Mar 25, 2013
- Research Repository (Delft University of Technology)
Robustness and Optimization of Complex Networks: Reconstructability, Algorithms and Modeling
- Research Article
11
- 10.1016/j.disc.2006.06.003
- Jul 26, 2006
- Discrete Mathematics
Graph polynomials from principal pivoting
- Research Article
1
- 10.1142/s1793830922501026
- Jun 14, 2022
- Discrete Mathematics, Algorithms and Applications
Let [Formula: see text] be the adjacency matrix of a graph [Formula: see text]. Let [Formula: see text] denote the row entries of [Formula: see text] corresponding to the vertex [Formula: see text] of [Formula: see text]. The Hamming distance between the strings [Formula: see text] and [Formula: see text] is the number of positions in which [Formula: see text] and [Formula: see text] differ. In this paper, we study the Hamming distance between the strings generated by the adjacency matrix of subgraph complement of a graph. We also compute sum of Hamming distances between all pairs of strings generated by the adjacency matrix of [Formula: see text].
- Book Chapter
- 10.1007/978-981-19-9307-7_10
- Jan 1, 2022
A continuous-time quantum walk on a graph evolves according to the unitary operator $$e^{-iAt}$$ , where A is the adjacency matrix of the graph. Perfect state transfer (PST) in a quantum walk is the transfer of a quantum state from one node of a graph to another node with $$100\%$$ fidelity. It can be shown that the adjacency matrix of a cubelike graph is a finite sum of tensor products of Pauli X operators. We use this fact to construct an efficient quantum circuit for the quantum walk on cubelike graphs. In [5, 15], a characterization of integer weighted cubelike graphs is given that exhibit periodicity or PST at time $$t=\pi /2$$ . We use our circuits to demonstrate PST or periodicity in these graphs on IBM’s quantum computing platform [1, 10].
- Research Article
3
- 10.1016/j.laa.2019.11.030
- Dec 4, 2019
- Linear Algebra and its Applications
Characteristic polynomials and zeta functions of equitably partitioned graphs
- Conference Article
3
- 10.1109/ic3.2018.8530624
- Aug 1, 2018
Online social networking has progressively been the new interdisciplinary research area, especially for developing new strategies of investigating these informal networks containing billions of users. However, such networks might not represent real-world connections among people either due to imperfect procurement forms or not yet reflected on the online platform like friends in real-world might not connect with each other online. To predict these unknown connections in the online community is still an open-ended problem. In this paper, a novel link prediction method is proposed to find the missing connections in the social network graphs. The proposed method extracts topological features from the network graph which are used to train an ensemble learning model i.e., random forest classifier. The trained model is used to predict the missing connections. The experimental evaluation is conducted on two networking dataset namely; ‘Facebook networking dataset’ and the ‘Flickr following dataset’ publicly available on Stanford Network Analysis Project (SNAP) and Koblenz Network Collection (KONECT) respectively. The comparison is done with the prediction results on the same features by the state-of-the-art learning models namely; linear support vector machine (LSVM), K-Nearest Neighbours (KNN), AdaBoost, and Gradient Boost. The performance of the considered methods is defined in terms of accuracy, precision, recall, F1-measure, and AUC value. Additionally, the efficiency of the proposed method is validated against the existing link prediction method. The experimental results conclude that the proposed method is accurate than the compared methods in uncovering the hidden links of a social network.