Abstract

In this article, we present a user-friendly web interface for two alignment-free sequence-comparison methods that we recently developed. Most alignment-free methods rely on exact word matches to estimate pairwise similarities or distances between the input sequences. By contrast, our new algorithms are based on inexact word matches. The first of these approaches uses the relative frequencies of so-called spaced words in the input sequences, i.e. words containing ‘don't care’ or ‘wildcard’ symbols at certain pre-defined positions. Various distance measures can then be defined on sequences based on their different spaced-word composition. Our second approach defines the distance between two sequences by estimating for each position in the first sequence the length of the longest substring at this position that also occurs in the second sequence with up to k mismatches. Both approaches take a set of deoxyribonucleic acid (DNA) or protein sequences as input and return a matrix of pairwise distance values that can be used as a starting point for clustering algorithms or distance-based phylogeny reconstruction. The two alignment-free programmes are accessible through a web interface at ‘Göttingen Bioinformatics Compute Server (GOBICS)’: http://spaced.gobics.de http://kmacs.gobics.de and the source codes can be downloaded.

Highlights

  • Comparative sequence analysis and phylogeny reconstruction are traditionally based on pairwise or multiple sequence alignments

  • Both approaches take a set of deoxyribonucleic acid (DNA) or protein sequences as input and return a matrix of pairwise distance values that can be used as a starting point for clustering algorithms or distance-based phylogeny reconstruction

  • After calculating the relative frequencies of all spaced words according to the fixed pattern P, our programme can use different distance measures to define pairwise distances among the input sequences based on their relative spacedword frequencies

Read more

Summary

Introduction

Comparative sequence analysis and phylogeny reconstruction are traditionally based on pairwise or multiple sequence alignments. Most alignment-free methods rely on exact word matches to estimate pairwise similarities or distances between the input sequences. Various distance measures can be defined on sequences based on their different spaced-word composition.

Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call