Relative Suffix Trees.

Andrea Farruggia,Travis Gagie,Simon J Puglisi,Jouni Sirén,Gonzalo Navarro

doi:10.1093/comjnl/bxx108

Abstract

Suffix trees are one of the most versatile data structures in stringology, with many applications in bioinformatics. Their main drawback is their size, which can be tens of times larger than the input sequence. Much effort has been put into reducing the space usage, leading ultimately to compressed suffix trees. These compressed data structures can efficiently simulate the suffix tree, while using space proportional to a compressed representation of the sequence. In this work, we take a new approach to compressed suffix trees for repetitive sequence collections, such as collections of individual genomes. We compress the suffix trees of individual sequences relative to the suffix tree of a reference sequence. These relative data structures provide competitive time/space trade-offs, being almost as small as the smallest compressed suffix trees for repetitive collections, and competitive in time with the largest and fastest compressed suffix trees.

Highlights

The suffix tree [1] is one of the most powerful bioinformatic tools to answer complex queries on DNA and protein sequences [2,3,4]
The index is based on approximating the longest common subsequence (LCS) of BWTR and BWTS, where R is the reference sequence and S is the target sequence, and storing several structures based on the common subsequence
We find a binary sequence BR [1, ∣R∣], which marks the common subsequence in R, and a strictly increasing integer sequence Y, which contains the positions of the common subsequence in S

Summary

INTRODUCTION

The suffix tree [1] is one of the most powerful bioinformatic tools to answer complex queries on DNA and protein sequences [2,3,4]. Rather than compressing the text directly, the current CSTs for repetitive collections [13, 15] apply grammar-based compression on the data structures that simulate the suffix tree. Structures for direct access [31, 32] and even for pattern matching [33] have been developed on top of RLZ Another approach to compressing a repetitive collection while supporting interesting queries is to build an automaton that accepts the sequences in the collection, and index the state diagram as an directed acyclic graph (DAG); see, for example, [34,35,36] for recent discussions. This dependency makes these structures vulnerable to even a small change in even one sequence to an otherwise-conserved region, which could hamper their scalability

One general CST or many individual CST s

Our contribution

BACKGROUND

Full-text indexes

Compressed text indexes

Relative Lempel–Ziv

RELATIVE FMI

Basic index

Relative select

Full functionality

Finding a bwt-invariant subsequence

RELATIVE SUFFIX TREE

Relative LCP array

EXPERIMENTS

Indexes and their sizes

Query times

Synthetic collections

Suffix tree operations

Findings

DISCUSSION

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: The computer journal	Publication Date: Nov 21, 2017
Citations: 49	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Relative Suffix Trees.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: The computer journal

Lead the way for us

Similar Papers

Solving All-Pairs Suffix Prefix – Theory and Practice
Maan Haj Rachid ... Qutaibah Malluhi
-
Maan Haj Rachid, et. al.Maan Haj Rachid ... Qutaibah Malluhi
01 Jan 2015
01 Jan 2015

Dynamic dictionary matching and compressed suffix trees
...
-
, et. al. ...
23 Jan 2005
23 Jan 2005

Faster repetition-aware compressed suffix trees based on Block Trees
Manuel Cáceres ... Gonzalo Navarro
Information and Computation | VOL. 285
Manuel Cáceres, et. al.Manuel Cáceres ... Gonzalo Navarro
28 Apr 2021
Information and Computation | VOL. 285

Compressed Suffix Trees with Full Functionality
Kunihiko Sadakane
Theory of Computing Systems | VOL. 41
Kunihiko SadakaneKunihiko Sadakane
07 Feb 2007
Theory of Computing Systems | VOL. 41

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Relative Suffix Trees.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: The computer journal