SAKE: Strobemer-assisted k-mer extraction.

Miika Leinonen,Leena Salmela

doi:10.1371/journal.pone.0294415

Abstract

K-mer-based analysis plays an important role in many bioinformatics applications, such as de novo assembly, sequencing error correction, and genotyping. To take full advantage of such methods, the k-mer content of a read set must be captured as accurately as possible. Often the use of long k-mers is preferred because they can be uniquely associated with a specific genomic region. Unfortunately, it is not possible to reliably extract long k-mers in high error rate reads with standard exact k-mer counting methods. We propose SAKE, a method to extract long k-mers from high error rate reads by utilizing strobemers and consensus k-mer generation through partial order alignment. Our experiments show that on simulated data with up to 6% error rate, SAKE can extract 97-mers with over 90% recall. Conversely, the recall of DSK, an exact k-mer counter, drops to less than 20%. Furthermore, the precision of SAKE remains similar to DSK. On real bacterial data, SAKE retrieves 97-mers with a recall of over 90% and slightly lower precision than DSK, while the recall of DSK already drops to 50%. We show that SAKE can extract more k-mers from uncorrected high error rate reads compared to exact k-mer counting. However, exact k-mer counters run on corrected reads can extract slightly more k-mers than SAKE run on uncorrected reads.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

SAKE: Strobemer-assisted k-mer extraction.

Abstract

Talk to us

Similar Papers

More From: PloS one

Lead the way for us

Journal: PloS one	Publication Date: Nov 29, 2023
License type: CC BY 4.0

Similar Papers

Combining partial order alignment and progressive multiple sequence alignment increases alignment speed and scalability to very large alignment problems.
Catherine Grasso ... Christopher Lee
Bioinformatics | VOL. 20
Catherine Grasso, et. al.Catherine Grasso ... Christopher Lee
12 Feb 2004
Bioinformatics | VOL. 20

Sequence Flow: interactive web application for visualizing partial order alignments
Krzysztof Zdąbłasz ... Norbert Dojer
BMC Genomics | VOL. 25
Krzysztof Zdąbłasz, et. al.Krzysztof Zdąbłasz ... Norbert Dojer
16 Oct 2024
BMC Genomics | VOL. 25

AbPOA: an SIMD-based C library for fast partial order alignment using adaptive band.
Yan Gao ... Yadong Wang
Bioinformatics | VOL. 37
Yan Gao, et. al.Yan Gao ... Yadong Wang
09 Nov 2020
Bioinformatics | VOL. 37

Rapid multi-locus sequence typing direct from uncorrected long reads using Krocus.
Andrew J Page ... Jacqueline A Keane
PeerJ | VOL. 6
Andrew J Page, et. al.Andrew J Page ... Jacqueline A Keane
31 Jul 2018
PeerJ | VOL. 6

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

SAKE: Strobemer-assisted k-mer extraction.

Abstract

Talk to us

Similar Papers

More From: PloS one