Indexing Arbitrary-Length k-Mers in Sequencing Reads.

Tomasz Kowalski,Szymon Grabowski,Sebastian Deorowicz

doi:10.1371/journal.pone.0133198

Tomasz Kowalski, Szymon Grabowski + Show 1 more

Open Access

https://doi.org/10.1371/journal.pone.0133198

Copy DOI

Abstract

We propose a lightweight data structure for indexing and querying collections of NGS reads data in main memory. The data structure supports the interface proposed in the pioneering work by Philippe et al. for counting and locating k-mers in sequencing reads. Our solution, PgSA (pseudogenome suffix array), based on finding overlapping reads, is competitive to the existing algorithms in the space use, query times, or both. The main applications of our index include variant calling, error correction and analysis of reads from RNA-seq experiments.

Highlights

The genome sequencing costs dropped recently to less than 5 thousand U.S dollars per human genome with about 30-fold coverage [1]
Gk arrays (GkA) is faster than CGkA, yet requiring at least 3 times more space
In the Q4 query, given by position, GkA is a clear winner in speed

Summary

Introduction

The genome sequencing costs dropped recently to less than 5 thousand U.S dollars per human genome with about 30-fold coverage [1]. All this results in enormous amounts of sequencing data. These data have to be processed in some way. They are mapped onto reference genomes and variant calling algorithms are used to identify the mutations present in sequenced genomes. Since the mapping requires fast search over reference genomes, a lot of indexing structures for genomes were adopted or invented. The situation changed with the advent of much more compact (compressed) index data structures. One of the recent successful examples is the MuGI multi-genome index [9], allowing to index 1092 human genomes in less than 10 GB of memory

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: PloS one	Publication Date: Jul 16, 2015
Citations: 39	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Indexing Arbitrary-Length k-Mers in Sequencing Reads.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PloS one

Lead the way for us

Similar Papers

A Lightweight Framework for Cold Boot Based Forensics on Mobile Devices
Benjamin Taubmann ... Manuel Huber
-
Benjamin Taubmann, et. al.Benjamin Taubmann ... Manuel Huber
01 Aug 2015
01 Aug 2015

A flexible framework for mobile device forensics based on cold boot attacks
Manuel Huber ... Benjamin Taubmann
EURASIP Journal on Information Security | VOL. 2016
Manuel Huber, et. al.Manuel Huber ... Benjamin Taubmann
11 Aug 2016
EURASIP Journal on Information Security | VOL. 2016

A Review Of The Suitability Of Various Applications For An Oo Approach
Roger Tagg
-
Roger TaggRoger Tagg
04 Jan 2019
04 Jan 2019

The harpy machine
R Bisiani
ACM SIGIR Forum | VOL. 15
R BisianiR Bisiani
11 Mar 1980
ACM SIGIR Forum | VOL. 15

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Indexing Arbitrary-Length k-Mers in Sequencing Reads.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PloS one