Abstract

The r-index is a tool for compressed indexing of genomic databases for exact pattern matching, which can be used to completely align reads that perfectly match some part of a genome in the database or to find seeds for reads that do not. This article shows how to download and install the programs ri-buildfasta and ri-align; how to call ri-buildfasta on an FASTA file to build an r-index for that file; and how to query that index with ri-align.

Highlights

  • The Burrows–Wheeler Transform (BWT) (Burrows and Wheeler, 1994) and the Full-text Minutespace FM index (FM-index) (Ferragina and Manzini, 2005) are central to the most popular short-read aligners, such as Burrows-Wheeler Aligner (BWA) (Li and Durbin, 2009) and Bowtie (Langmead et al, 2009), but until recently it was not known how to apply these concepts effectively to whole genomic databases

  • The r-index is a tool for compressed indexing of genomic databases for exact pattern matching, which can be used to completely align reads that perfectly match some part of a genome in the database or to find seeds for reads that do not

  • This article shows how to download and install the programs ri-buildfasta and ri-align; how to call ri-buildfasta on an FASTA file to build an r-index for that file; and how to query that index with ri-align

Read more

Summary

BACKGROUND

The Burrows–Wheeler Transform (BWT) (Burrows and Wheeler, 1994) and the Full-text Minutespace FM index (FM-index) (Ferragina and Manzini, 2005) are central to the most popular short-read aligners, such as Burrows-Wheeler Aligner (BWA) (Li and Durbin, 2009) and Bowtie (Langmead et al, 2009), but until recently it was not known how to apply these concepts effectively to whole genomic databases. Building on previous authors’ work (Makinen et al, 2010; Gagie et al, 2018) described how a fully functional variant of the FM-index for such a database could be stored in reasonable space: their variant takes O(r) machine words, where r is the number of runs in the BWT of the database, and is called the r-index. Prezza gave a preliminary implementation, which was significantly extended by Boucher et al (2019) and Kuhnle et al (2019). This article is meant as a brief guide to the extended implementation. For help troubleshooting or to provide feedback, please submit an issue to our GitHub page, which has more documentation

INSTALLATION
CONSTRUCTION
ALIGNMENT
INTERPRETING COUNTS
INTERPRETING LOCATIONS
Findings
FUNDING INFORMATION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.