Abstract

The knowledge of the amino acid sequence of a gene product is still not sufficient to understand its function at the molecular level: the function of a protein is determined by and large by its three-dimensional structure. We can predict the structure of a protein from its amino acid sequence in some cases, not in all cases. In other words, we cannot, as of today and probably for some years to come, simulate the folding process of a protein. The only reason we can continue to discuss the problem of structure prediction is indeed that proteins are a product of evolution. The evolutionary mechanisms imply that proteins mostly evolved via small sequence variation, usually single amino acid substitutions, insertions and deletions. Therefore the sequences of proteins that are “sufficiently” closely evolutionary related in evolution preserve detectable similarities. An unfolded polypeptide is risky for the cell, because it exposes hydrophobic groups that favour aggregation with other proteins. Consequently, we can assume that each of the evolutionary steps has produced a structure compatible with the function of the protein. Note that function is usually brought about by few key amino acids, but is dependent on their correct positioning in the active site, that is on the correct folding of the polypeptide. All the above implies that evolutionary related proteins not only have similar sequences but also similar structures. In other words, if two proteins have a sequence sufficiently similar to guarantee that they are evolutionary related, we might be reasonably certain that they also have a similar structure. This observation forms the basis of a protein structure prediction method called “comparative modelling” or “modelling by homology” that we applied to the GENCODE gene product set. We analysed the putative structure of all the gene products of known structure or for which a reliable three-dimensional model could be built and analysed the results asking the question of whether these products could give raise to a functional element. The most striking results of our analysis is that, for more than half of these alternative transcripts, the resulting protein structure is likely to be substantially altered in relation to that of the principal sequence. Our results leave open two important questions, namely which is the role of the putative non-functional variants and, equally importantly, how can we reconcile the number of functions of a human organism with its apparently very low coding content.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call