Abstract

BackgroundFunctionally relevant artificial or natural mutations are difficult to assess or predict if no structure-function information is available for a protein. This is especially important to correctly identify functionally significant non-synonymous single nucleotide polymorphisms (nsSNPs) or to design a site-directed mutagenesis strategy for a target protein. A new and powerful methodology is proposed to guide these two decision strategies, based only on conservation rules of physicochemical properties of amino acids extracted from a multiple alignment of a protein family where the target protein belongs, with no need of explicit structure-function relationships.ResultsA statistical analysis is performed over each amino acid position in the multiple protein alignment, based on different amino acid physical or chemical characteristics, including hydrophobicity, side-chain volume, charge and protein conformational parameters. The variances of each of these properties at each position are combined to obtain a global statistical indicator of the conservation degree of each property. Different types of physicochemical conservation are defined to characterize relevant and irrelevant positions. The differences between statistical variances are taken together as the basis of hypothesis tests at each position to search for functionally significant mutable sites and to identify specific mutagenesis targets. The outcome is used to statistically predict physicochemical consensus sequences based on different properties and to calculate the amino acid propensities at each position in a given protein. Hence, amino acid positions are identified that are putatively responsible for function, specificity, stability or binding interactions in a family of proteins. Once these key functional positions are identified, position-specific statistical distributions are applied to divide the 20 common protein amino acids in each position of the protein's primary sequence into a group of functionally non-disruptive amino acids and a second group of functionally deleterious amino acids.ConclusionsWith this approach, not only conserved amino acid positions in a protein family can be labeled as functionally relevant, but also non-conserved amino acid positions can be identified to have a physicochemically meaningful functional effect. These results become a discriminative tool in the selection and elaboration of rational mutagenesis strategies for the protein. They can also be used to predict if a given nsSNP, identified, for instance, in a genomic-scale analysis, can have a functional implication for a particular protein and which nsSNPs are most likely to be functionally silent for a protein. This analytical tool could be used to rapidly and automatically discard any irrelevant nsSNP and guide the research focus toward functionally significant mutations. Based on preliminary results and applications, this technique shows promising performance as a valuable bioinformatics tool to aid in the development of new protein variants and in the understanding of function-structure relationships in proteins.

Highlights

  • Relevant artificial or natural mutations are difficult to assess or predict if no structurefunction information is available for a protein

  • If the aim is to identify non-synonymous single nucleotide polymorphisms (nsSNPs), mutations that could alter protein function are most probably located at Invariable determinant position (IDP) and Variable determinant position (VDP)

  • A statistical procedure has been designed and presented to semi-automatically identify functionally significant mutable positions in a protein, based on the conservation of physicochemical properties. Such positions are identified and classified into three groups, according to the influence their mutation could have on protein function. Those in which a mutation does not alter the function and basic characteristics of the protein, but do change them slightly, and those in which a mutation is totally deleterious for the protein are the most relevant positions to look for nsSNPs, while only the former are important when trying to develop site-directed mutagenesis strategies so that variants with improved properties could be generated

Read more

Summary

Introduction

Relevant artificial or natural mutations are difficult to assess or predict if no structurefunction information is available for a protein. This is especially important to correctly identify functionally significant non-synonymous single nucleotide polymorphisms (nsSNPs) or to design a site-directed mutagenesis strategy for a target protein. Site-directed mutagenesis is a tool used in rational protein design strategies to modify the structure or function of a protein to adapt it to particular performance requirements. It is necessary to know, for each amino acid of the protein or at least for a select group of them, what is their particular contribution to the structure and function of the protein as a whole. For a medium-sized protein and with no additional information regarding the possible structure-function relationship, an exhaustive search is practically impossible [3]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call