Fast alignment-free sequence comparison using spaced-word frequencies

Chris-Andre Leimeister,Sebastian Horwege,Sebastian Lindner,Marcus Boden,Burkhard Morgenstern

doi:10.1093/bioinformatics/btu177

Chris-Andre Leimeister, Sebastian Horwege + Show 3 more

Open Access

https://doi.org/10.1093/bioinformatics/btu177

Copy DOI

Journal: Bioinformatics	Publication Date: Apr 3, 2014
Citations: 145	License type: CC BY 3.0

Affiliation: University of Göttingen

Abstract

Motivation: Alignment-free methods for sequence comparison are increasingly used for genome analysis and phylogeny reconstruction; they circumvent various difficulties of traditional alignment-based approaches. In particular, alignment-free methods are much faster than pairwise or multiple alignments. They are, however, less accurate than methods based on sequence alignment. Most alignment-free approaches work by comparing the word composition of sequences. A well-known problem with these methods is that neighbouring word matches are far from independent.Results: To reduce the statistical dependency between adjacent word matches, we propose to use ‘spaced words’, defined by patterns of ‘match’ and ‘don’t care’ positions, for alignment-free sequence comparison. We describe a fast implementation of this approach using recursive hashing and bit operations, and we show that further improvements can be achieved by using multiple patterns instead of single patterns. To evaluate our approach, we use spaced-word frequencies as a basis for fast phylogeny reconstruction. Using real-world and simulated sequence data, we demonstrate that our multiple-pattern approach produces better phylogenies than approaches relying on contiguous words.Availability and implementation: Our program is freely available at http://spaced.gobics.de/.Contact: chris.leimeister@stud.uni-goettingen.deSupplementary information: Supplementary data are available at Bioinformatics online.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Fast alignment-free sequence comparison using spaced-word frequencies

Abstract

Talk to us

Similar Papers

More From: Bioinformatics

Lead the way for us

Similar Papers

Alignment-free sequence comparison with spaced k-mers.
...
-
, et. al. ...
01 Jan 2013
01 Jan 2013

Filtered spaced-word matches: a novel approach to fast and accurate sequence comparison
Chris-Andre Leimeister
-
Chris-Andre LeimeisterChris-Andre Leimeister
21 Feb 2022
21 Feb 2022

Rasbhari: Optimizing Spaced Seeds for Database Searching, Read Mapping and Alignment-Free Sequence Comparison.
Lars Hahn ... Stefano Lonardi
PLOS Computational Biology | VOL. 12
Lars Hahn, et. al.Lars Hahn ... Stefano Lonardi
19 Oct 2016
PLOS Computational Biology | VOL. 12

Whole genome/proteome based phylogeny reconstruction for prokaryotes using higher order Markov model and chaos game representation
Wei-Feng Yang ... Vo Anh
Molecular Phylogenetics and Evolution | VOL. 96
Wei-Feng Yang, et. al.Wei-Feng Yang ... Vo Anh
24 Dec 2015
Molecular Phylogenetics and Evolution | VOL. 96

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Fast alignment-free sequence comparison using spaced-word frequencies

Abstract

Talk to us

Similar Papers

More From: Bioinformatics