RECUT: RE-Compressing partially Unordered Trees

Stefan Bottcher,Rita Hartel

doi:10.1109/bigdata.2018.8622261

Abstract

Huge amounts of tree structured data (such as JSON, XML, etc.) can be compressed to straight-line context-free (SLCF) tree grammars. This compression of tree-structured data of size O(N) reduces the number of edges ideally to O(log N), which reduces the memory footprint and speeds-up search algorithms. Nevertheless, grammar-based tree compression has two major limitations resulting in an unnecessary high number of edges: first, updates of SLCF grammars by path extraction over time increase the grammar size, second, a given sibling-order prohibits a compression of arbitrary combinations of siblings. A way out of the first limitation is re-compression, i.e. to find a stronger compressed SLCF tree grammar of the tree without decompressing the given grammar. A way out of the second limitation is an extension of re-compression to unordered trees. RECUT provides not only re-compression for unordered trees, but also for partially ordered trees. Furthermore, whenever parts of the data are unordered - as it is the case for data-centric data - RECUT significantly improves grammar-based compression by further reducing the number of edges in the compressed data by a factor of up to 30.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

RECUT: RE-Compressing partially Unordered Trees

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Euler String-Based Compression of Tree-Structured Data and its Application to Analysis of RNAs
Liwei Liu ... Tomoya Mori
Current Bioinformatics | VOL. 13
Liwei Liu, et. al.Liwei Liu ... Tomoya Mori
19 Feb 2018
Current Bioinformatics | VOL. 13

Integer programming-based method for grammar-based tree compression and its application to pattern extraction of glycan tree structures
Yang Zhao ... Tatsuya Akutsu
BMC Bioinformatics | VOL. 11
Yang Zhao, et. al.Yang Zhao ... Tatsuya Akutsu
01 Dec 2010
BMC Bioinformatics | VOL. 11

Regular Approximation of Weighted Linear Context-Free Tree Languages
Markus Teichmann
International Journal of Foundations of Computer Science | VOL. 28
Markus TeichmannMarkus Teichmann
01 Aug 2017
International Journal of Foundations of Computer Science | VOL. 28

Grammar-based compression approach to extraction of common rules among multiple trees of glycans and RNAs.
Yang Zhao ... Tatsuya Akutsu
BMC bioinformatics | VOL. 16
Yang Zhao, et. al.Yang Zhao ... Tatsuya Akutsu
24 Apr 2015
BMC bioinformatics | VOL. 16

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

RECUT: RE-Compressing partially Unordered Trees

Abstract

Talk to us

Similar Papers