NtHash2: recursive spaced seed hashing for nucleotide sequences.

Parham Kazemi,Inanç Birol,Hamid Mohamadi,René L Warren,Vladimir Nikolić,Johnathan Wong

doi:10.1093/bioinformatics/btac564

Abstract

MotivationSpaced seeds are robust alternatives to k-mers in analyzing nucleotide sequences with high base mismatch rates. Hashing is also crucial for efficiently storing abundant sequence data. Here, we introduce ntHash2, a fast algorithm for spaced seed hashing that can be integrated into various bioinformatics tools for efficient sequence analysis with applications in genome research.ResultsntHash2 is up to 2.1× faster at hashing various spaced seeds than the previous version and 3.8× faster than conventional hashing algorithms with naïve adaptation. Additionally, we reduced the collision rate of ntHash for longer k-mer lengths and improved the uniformity of the hash distribution by modifying the canonical hashing mechanism.Availability and implementationntHash2 is freely available online at github.com/bcgsc/ntHash under an MIT license.Supplementary information Supplementary data are available at Bioinformatics online.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Bioinformatics	Publication Date: Aug 24, 2022
Citations: 9	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

NtHash2: recursive spaced seed hashing for nucleotide sequences.

Abstract

Talk to us

Similar Papers

More From: Bioinformatics

Lead the way for us

Similar Papers

BLMT
Madhavi Ganapathiraju ... Vijayalaxmi Manoharan
Applied Bioinformatics | VOL. 3
Madhavi Ganapathiraju, et. al.Madhavi Ganapathiraju ... Vijayalaxmi Manoharan
01 Jan 2004
Applied Bioinformatics | VOL. 3

Nucleotide Sequence and Restriction Fragment Length Polymorphism Analysis of the Long Terminal Repeat of Human T Cell Leukemia Virus Type II
Nobutaka Eiraku ... Shi Wei Zhu
AIDS Research and Human Retroviruses | VOL. 11
Nobutaka Eiraku, et. al.Nobutaka Eiraku ... Shi Wei Zhu
01 May 1995
AIDS Research and Human Retroviruses | VOL. 11

Multiple Sequence Alignments: The Next Generation
Kazutaka Katoh ... Kazuharu Misawa
Seibutsu Butsuri | VOL. 46
Kazutaka Katoh, et. al.Kazutaka Katoh ... Kazuharu Misawa
01 Jan 2006
Seibutsu Butsuri | VOL. 46

A sequencers' sequence analysis package for the IBM PC
Robert M Stephens
Gene Analysis Techniques | VOL. 2
Robert M StephensRobert M Stephens
01 Jul 1985
Gene Analysis Techniques | VOL. 2

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

NtHash2: recursive spaced seed hashing for nucleotide sequences.

Abstract

Talk to us

Similar Papers

More From: Bioinformatics