In-memory Algorithm Research Articles

This paper revisits set containment join (SCJ) problem, which uses the subset relationship (i.e., subseteq) as condition to join set-valued attributes of two relations and has many fundamental applications in commercial and scientific fields. Existing in-memory algorithms for SCJ are either signature-based or prefix-tree-based. The former incurs high CPU cost because of the enumeration of signatures, while the latter incurs high space cost because of the storage of prefix trees. This paper proposes a new adaptive parameter-free in-memory algorithm, named as frequency-hashjoin or {mathsf {FreshJoin}} in short, to evaluate SCJ efficiently. {mathsf {FreshJoin}} builds a flat index on-the-fly to record three kinds of signatures (i.e., two least frequent elements and a hash signature whose length is determined adaptively by the frequencies of elements in the universe set). The index consists of two sparse inverted indices and two arrays which record hash signatures of all sets in each relation. The index is well organized such that {mathsf {FreshJoin}} can avoid enumerating hash signatures. The rationality of this design is explained. And, the time and space cost of the proposed algorithm, which provide a rule to choose {mathsf {FreshJoin}} from existing algorithms, are analyzed. Experiments on 16 real-life datasets show that {mathsf {FreshJoin}} usually reduces more than 50% of space cost while remains as competitive as the state-of-the-art algorithms in running time.

Read full abstract

The k-truss of a graph is the largest edge-induced subgraph such that every edge is contained in at least k triangles within the subgraph, where a triangle is a cycle consisting of three vertices. As a new notion of cohesive subgraphs, truss has recently attracted a lot of research attentions in the database and data mining fields. At the same time, uncertainty is an intrinsic property of massive graph data, and truss decomposition (i.e., finding all k-trusses of a graph) has become a key primitive on uncertain graphs. In this paper, we study the truss decomposition problem on uncertain graphs, that is, finding all highly probable k-trusses of an uncertain graph. We first give an formal statement of the truss decomposition problem on uncertain graphs. Then, we prove that the truss decomposition of an uncertain graph attains two elegant properties, namely uniqueness and hierarchy. We show that the truss decomposition of an uncertain graph can be found in $$O(m^{1.5}Q)$$O(m1.5Q) time by proposing an in-memory algorithm called $$\mathtt {TD_{mem}}$$TDmem, where m is the number of edges of the uncertain graph, and Q is at most the maximum number of common neighbors of the endpoints of an edge. When an uncertain graph is too large to fit into main memory, we propose an external-memory algorithm $$\mathtt {TD_{I/O}}$$TDI/O to find the truss decomposition of the uncertain graph. Extensive experiments have been carried out to evaluate the practical performance of the proposed algorithms. The experimental results verify that both $$\mathtt {TD_{mem}}$$TDmem and $$\mathtt {TD_{I/O}}$$TDI/O are efficient when an uncertain graph is small enough to fit into main memory, and that $$\mathtt {TD_{I/O}}$$TDI/O is much faster than $$\mathtt {TD_{mem}}$$TDmem when the graph is too large to fit into main memory.

Read full abstract

In-memory Algorithm Research Articles

Articles published on In-memory Algorithm

PgRC: pseudogenome-based read compressor.

FreshJoin: An Efficient and Adaptive Algorithm for Set Containment Join

Truss decomposition of uncertain graphs

Efficient Distributed RNN Query Processing with Caching

Truss decomposition in massive networks

Practical algorithms for unsatisfiability proof and core generation in SAT solvers

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

In-memory Algorithm Research Articles

Articles published on In-memory Algorithm

PgRC: pseudogenome-based read compressor.

FreshJoin: An Efficient and Adaptive Algorithm for Set Containment Join

Truss decomposition of uncertain graphs

Efficient Distributed RNN Query Processing with Caching

Truss decomposition in massive networks

Practical algorithms for unsatisfiability proof and core generation in SAT solvers