Abstract

As next-generation sequencing projects generate massive genome-wide sequence variation data, bioinformatics tools are being developed to provide computational predictions on the functional effects of sequence variations and narrow down the search of casual variants for disease phenotypes. Different classes of sequence variations at the nucleotide level are involved in human diseases, including substitutions, insertions, deletions, frameshifts, and non-sense mutations. Frameshifts and non-sense mutations are likely to cause a negative effect on protein function. Existing prediction tools primarily focus on studying the deleterious effects of single amino acid substitutions through examining amino acid conservation at the position of interest among related sequences, an approach that is not directly applicable to insertions or deletions. Here, we introduce a versatile alignment-based score as a new metric to predict the damaging effects of variations not limited to single amino acid substitutions but also in-frame insertions, deletions, and multiple amino acid substitutions. This alignment-based score measures the change in sequence similarity of a query sequence to a protein sequence homolog before and after the introduction of an amino acid variation to the query sequence. Our results showed that the scoring scheme performs well in separating disease-associated variants (n = 21,662) from common polymorphisms (n = 37,022) for UniProt human protein variations, and also in separating deleterious variants (n = 15,179) from neutral variants (n = 17,891) for UniProt non-human protein variations. In our approach, the area under the receiver operating characteristic curve (AUC) for the human and non-human protein variation datasets is ∼0.85. We also observed that the alignment-based score correlates with the deleteriousness of a sequence variation. In summary, we have developed a new algorithm, PROVEAN (Protein Variation Effect Analyzer), which provides a generalized approach to predict the functional effects of protein sequence variations including single or multiple amino acid substitutions, and in-frame insertions and deletions. The PROVEAN tool is available online at http://provean.jcvi.org.

Highlights

  • Recent advances in high-throughput technologies have generated massive amounts of genome sequence and genotype data for humans and a number of model species

  • Delta alignment score In pairwise sequence alignments, alignment scores can be used as a measure of sequence similarity to assess how likely the sequence pairs are homologous or related. In keeping with this idea, one can interpret a change in the alignment score caused by an amino acid variation as the impact of the variation on protein function

  • Deletion Insertion have been used as a metric to measure amino acid conservation, and amino acid variants which correspond to nonconserved substitutions are predicted as deleterious [21,22]

Read more

Summary

Introduction

Recent advances in high-throughput technologies have generated massive amounts of genome sequence and genotype data for humans and a number of model species. The enormous amount of sequence variation data generated from large-scale projects necessitates computational approaches to assess the potential impact of amino acid changes on gene functions. A number of computational methods have been developed based on such evolutionary principles to predict the effect of coding variants on protein function, including SIFT [5], PolyPhen-2 [6], Mutation Assessor [7], MAPP [8], PANTHER [9], LogR.E-value [10], Condel [11] and several others [12,13]. The prediction tools obtain information on amino acid conservation directly from alignment with homologous and distantly related sequences. Condel provides a method to produce a combined prediction result by integrating the scores obtained from different predictive tools

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call