MergedTrie: Efficient textual indexing.

Antonio Ferrández,Jesús Peral

doi:10.1371/journal.pone.0215288

Antonio Ferrández, Jesús Peral

Open Access

https://doi.org/10.1371/journal.pone.0215288

Copy DOI

Journal: PLOS ONE	Publication Date: Apr 23, 2019
Citations: 3	License type: CC BY 4.0

Affiliation: University of Alicante, Software (Spain)

Abstract

The accessing and processing of textual information (i.e. the storing and querying of a set of strings) is especially important for many current applications (e.g. information retrieval and social networks), especially when working in the fields of Big Data or IoT, which require the handling of very large string dictionaries. Typical data structures for textual indexing are Hash Tables and some variants of Tries such as the Double Trie (DT). In this paper, we propose an extension of the DT that we have called MergedTrie. It improves the DT compression by merging both Tries into a single and by segmenting the indexed term into two fixed length parts in order to balance the new Trie. Thus, a higher overlapping of both prefixes and suffixes is obtained. Moreover, we propose a new implementation of Tries that achieves better compression rates than the Double-Array representation usually chosen for implementing Tries. Our proposal also overcomes the limitation of static implementations that does not allow insertions and updates in their compact representations. Finally, our MergedTrie implementation experimentally improves the efficiency of the Hash Tables, the DTs, the Double-Array, the Crit-bit, the Directed Acyclic Word Graphs (DAWG), and the Acyclic Deterministic Finite Automata (ADFA) data structures, requiring less space than the original text to be indexed.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

MergedTrie: Efficient textual indexing.

Abstract

Talk to us

Similar Papers

More From: PLOS ONE

Lead the way for us

Similar Papers

Compressed Directed Acyclic Word Graph with Application in Local Alignment
...
Algorithmica | VOL. 67
, et. al. ...
10 May 2013
Algorithmica | VOL. 67

Truncated DAWGs and Their Application to Minimal Absent Word Problem
Yuta Fujishige ... Takuya Takagi
-
Yuta Fujishige, et. al.Yuta Fujishige ... Takuya Takagi
01 Jan 2018
01 Jan 2018

Linear-time computation of DAWGs, symmetric indexing structures, and MAWs for integer alphabets
Yuta Fujishige ... Masayuki Takeda
Theoretical Computer Science | VOL. 973
Yuta Fujishige, et. al.Yuta Fujishige ... Masayuki Takeda
27 Jul 2023
Theoretical Computer Science | VOL. 973

Compressed Directed Acyclic Word Graph with Application in Local Alignment
Do Huy Hoang ... Sung Wing Kin
-
Do Huy Hoang, et. al.Do Huy Hoang ... Sung Wing Kin
01 Jan 2010
01 Jan 2010

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

MergedTrie: Efficient textual indexing.

Abstract

Talk to us

Similar Papers

More From: PLOS ONE