A fast computation of pairwise sequence alignment scores between a protein and a set of single-locus variants of another protein

Yongwook Choi

doi:10.1145/2382936.2382989

Abstract

Recently we have developed a new algorithm, PROVEAN (Protein Variation Effect Analyzer), for predicting the functional effect of protein sequence variations, including single amino acid substitutions and small insertions and deletions [2]. The prediction is based on the change, caused by a given variation, in the similarity of the query sequence to a set of its related protein sequences. For this prediction, the algorithm is required to compute a semi-global pairwise sequence alignment score between the query sequence and each of the related sequences. Using dynamic programming, it takes O(n · m) time to compute alignment score between the query sequence Q of length n and a related sequence S of length m. Thus given l different variations in Q, in a naive way it would take O(l · n · m) time to compute the alignment scores between each of the variant query sequences and S. In this paper, we present a new approach to efficiently compute the pairwise alignment scores for l variations, which takes O((n + l) · m) time when the length of variations is bounded by a constant. In this approach, we further utilize the solutions of overlapping subproblems, which are already used by dynamic programming approach. Our algorithm has been used to build a new database for precomputed prediction scores for all possible single amino acid substitutions, single amino acid insertions, and up to 10 amino acids deletions in about 91K human proteins (including isoforms), where l becomes very large, that is, l = O(n). The PROVEAN source code and web server are available at http://provean.jcvi.org.

Full Text