Abstract

Huge amounts of tree structured data (such as JSON, XML, etc.) can be compressed to straight-line context-free (SLCF) tree grammars. This compression of tree-structured data of size O(N) reduces the number of edges ideally to O(log N), which reduces the memory footprint and speeds-up search algorithms. Nevertheless, grammar-based tree compression has two major limitations resulting in an unnecessary high number of edges: first, updates of SLCF grammars by path extraction over time increase the grammar size, second, a given sibling-order prohibits a compression of arbitrary combinations of siblings. A way out of the first limitation is re-compression, i.e. to find a stronger compressed SLCF tree grammar of the tree without decompressing the given grammar. A way out of the second limitation is an extension of re-compression to unordered trees. RECUT provides not only re-compression for unordered trees, but also for partially ordered trees. Furthermore, whenever parts of the data are unordered - as it is the case for data-centric data - RECUT significantly improves grammar-based compression by further reducing the number of edges in the compressed data by a factor of up to 30.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.