An Efficient Two-Level-Partitioning-Based Double Array and Its Parallelization

Lianyin Jia,Jiaman Ding,Yinong Chen,Mengjuan Li,Yong Liu,Chongde Zhang

doi:10.3390/app10155266

Lianyin Jia, Jiaman Ding + Show 4 more

Open Access

PDF Available

https://doi.org/10.3390/app10155266

Copy DOI

Export

Save

Cite

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

Trie is one of the most common data structures for string storage and retrieval. As a fast and efficient implementation of trie, double array (DA) can effectively compress strings to reduce storage spaces. However, this method suffers from the problem of low index construction efficiency. To address this problem, we design a two-level partition (TLP) framework in this paper. We first divide the dataset is into smaller lower-level partitions, and then we merge these partitions into bigger upper-level partitions using a min-heap based greedy merging algorithm (MH-GMerge). TLP has an excellent characteristic of load balancing and can be easily parallelized. We implemented two efficient parallel partitioned DAs based on TLP. Extensive experiments were carried out, and the results showed that the proposed methods can significantly improve the construction efficiency of DA and can achieve a better trade-off between construction and retrieval performance than the existing state-of-the-art methods.

Highlights

String storage and retrieval are fundamental operations in many fields, such as in search engine, natural language processing, and artificial intelligence applications
Extensive experiments show that our proposed indexes can significantly improve construction efficiency of double array (DA) and outperform some other state-of-the-art competitors in many aspects
There are two common partitioning strategies available for two characters may contend for a single position in DA, leading to position competition for parallelTwo string processing: Balanced possible collisions are shown below.(BP) and Balanced Partition with Partition Line

Summary

Introduction

String storage and retrieval are fundamental operations in many fields, such as in search engine, natural language processing, and artificial intelligence applications. Just as B+ -Tree is the representative of database index for integer [1], trie is one of the most common structures for string storage and retrieval and is extensively used in artificial intelligence [2,3], natural language processing [4], data mining [5], IP address searching [6,7], string similarity joining [8,9], and many other fields. The linked form is efficient in space overheads, but its retrieval efficiency is relatively slow. Both of them are difficult to balance between retrieval performance and storage overheads. Level-Ordered Unary Degree Sequence (LOUDS) [18,19] and double array (DA) [20] are the two most

Objectives

Methods

Findings

Conclusion

Full Text

Published Version (Free)

View/Download pdf

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Applied Sciences	Publication Date: Jul 30, 2020
Citations: 1	License type: CC BY 4.0

R Discovery Prime

An Efficient Two-Level-Partitioning-Based Double Array and Its Parallelization

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: Applied Sciences

Lead the way for us

Similar Papers

Ω-storage: a self organizing multi-attribute storage technique for very large main memories
J.S Karlsson ... M.L Kersten
-
J.S Karlsson, et. al.J.S Karlsson ... M.L Kersten
31 Jan 2000
31 Jan 2000

<title>Computer vision systems: integration of software architectures</title>
Edward H Bohling ... R P O'Connor
-
Edward H Bohling, et. al.Edward H Bohling ... R P O'Connor
01 Apr 1991
01 Apr 1991

English
...
-
, et. al. ...
01 Jan 2010
01 Jan 2010

An effective and economical architecture for semantic-based heterogeneous multimedia big data retrieval
Kehua Guo ... Jianhua Ma
Journal of Systems and Software | VOL. 102
Kehua Guo, et. al.Kehua Guo ... Jianhua Ma
23 Sep 2014
Journal of Systems and Software | VOL. 102

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

An Efficient Two-Level-Partitioning-Based Double Array and Its Parallelization

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: Applied Sciences