Hydrophobic compaction, curvature of space and deciphering protein sequences

Jean-Paul Mornon

doi:10.1051/epn:2003104

Abstract

I n the July 2000 issue of the Bulletin de la SFP [1), Yves-Henri Sanejouand and Georges Trinquier presented an overview of the surprising ability of simple topological models (cubic nets) to explain the fundamental three-dimensional (3D) folding properties of the polymers that we call proteins, those essential components oflife. Here, through a number ofdifferent examples, I will illustrate another equally surprising aspect of another apparent simplicity, that relating to hydrophobic compaction, which governs the folding of these macromolecules and which directly serves as a useful tool to decipher genes, thereby opening new prospects in this post-genomic era. A protein is a linear, unbranched polymer consisting of anywhere from a few dozen to a few thousand links. Nature has limited the chemical diversity of proteins, with the occasional exception, to twenty different types (the twenty common amino acids). All amino acids share the same backbone, differing in terms of their side chains (Fig. 1). Seven amino acids have an aliphatic or aromatic side chain, making them stronglyhydrophobic: V (valine), I (isoleucine), L (leucine), F (phenylalanine), M (methionine), Y (tyrosine) and W (tryptophane). Six have a strongly hydrophilic side chain: D (aspartic acid), E (glutamic acid),N (asparagine), Q (glutamine), K (lysine) and R (arginine), while the other seven have intermediate properties: A (alanine), C (cysteine), T (threonine), G (glycine), P (proline), S (serine), and H (histidine). This distribution ofhydrophobicitylhydrophilicity offers a clever range ofblocks with which to build macromolecules exhibiting remarkable physicochemical properties. Under normal conditions, any fairly long polypeptide (from a few dozen to a few hundred amino acids) folds spontaneously in the presence ofwater into globular domains with a stable threedimensional architecture; some can also fold specifically (often in helical form) within lipid membranes. It is the dichotomy between hydrophobicitylhydrophilicity that acts as the driving force for these processes (e.g. (2)), as it does for many other physicochemical situations in the world around us. The succession of different types of amino acids along the polymer, which is specific to each protein, is called the primary structure, or sequence. This information is sufficient for the polypeptide chain to adopt a stable and unique three-dimensional structure in.a suitable medium (mainly water), with the occasional exception (Fig. 1). Yet not all the positions of a given sequence have the same influence on the cooperative process of folding. As a result, ancestral proteins have undergone considerable modification during evolution, amino acid after amino acid, without altering their resulting three-dimensional structure nor affecting their associated biological function(s). Thus, it is not infrequent that proteins with only a very low number of chemically conserved homologous positions along the polypeptide chain (cf. Fig.4A),i.e. a very low level of sequence identity (10 % for example), are in fact close cousins within a same structural or functional family. The result is that, while the number of natural sequences in

Full Text