Abstract
Proteins evolve by mutations and natural selection. The network of sequence similarities is a rich source for mining homologous relationships that inform on protein structure and function. There are many servers available to browse the network of homology relationships but one has to wait up to a minute for results. The SANSparallel webserver provides protein sequence database searches with immediate response and professional alignment visualization by third-party software. The output is a list, pairwise alignment or stacked alignment of sequence-similar proteins from Uniprot, UniRef90/50, Swissprot or Protein Data Bank. The stacked alignments are viewed in Jalview or as sequence logos. The database search uses the suffix array neighborhood search (SANS) method, which has been re-implemented as a client-server, improved and parallelized. The method is extremely fast and as sensitive as BLAST above 50% sequence identity. Benchmarks show that the method is highly competitive compared to previously published fast database search programs: UBLAST, DIAMOND, LAST, LAMBDA, RAPSEARCH2 and BLAT. The web server can be accessed interactively or programmatically at http://ekhidna2.biocenter.helsinki.fi/cgi-bin/sans/sans.cgi. It can be used to make protein functional annotation pipelines more efficient, and it is useful in interactive exploration of the detailed evidence supporting the annotation of particular proteins of interest.
Highlights
Recent years have witnessed a remarkable growth in the number of sequences
LAMBDA outputs maximally 500 hits, comparisons are shown for 1000 hits and 500 hits
The performance of all methods is quite similar above 50% sequence identity, differences are mainly seen in the detection of remote homologs below 50% sequence identity (Figure 2)
Summary
Recent years have witnessed a remarkable growth in the number of sequences. This has made database searches [1,2,3,4] take longer and longer and forced free computing services and pre-computed databases to close down or resort to crowd-sourcing [5,6,7]. SANSparallel is a re-implemented, improved and parallelized version of our previous suffix array neighborhood search (SANS) algorithm [9]. It belongs to a new generation of fast database search programs indexing the database so that short words (seeds) matching to the query can be found efficiently and independent of database size [10,11,12,13,14,15]. On the other hand, spaced seeds and reduced alphabets have been introduced to increase sensitivity [16] Programs implementing these techniques are orders of magnitude faster than BLAST. We present more benchmarking and show that SANSparallel is highly competitive in comparison with recently published programs
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have