Memetic approach for prediction of low resolution protein structures using lattice models

Kamrul Islam

doi:10.4225/03/589011094cb1d

Abstract

Proteins are biochemical compounds and consist essentially of linear chain of amino acids which fold quickly into a three dimensional native structure to facilitate specific biological functions. Various computational approaches, often studied to determine (in silico) protein structures, are categorised in two main categories, namely, ab initio approach (without any prior information) or the homology approach (template based). To reduce computational complexity, ab initio approaches often apply simple low resolution HP model of proteins which represent amino acids as hydrophobic (H) or hydrophilic (P), for investigating protein structure prediction (PSP) problem. In spite of this simplicity, the protein structure prediction (PSP) problem remains NP hard, emphasising the need for development of suitable evolutionary search algorithms. A memetic algorithm (MA) has the ability to perform both local and global search and also has flexible architecture. This thesis focuses on design and development of novel features for memetic search algorithm making it suitable for the complex ab initio approach for PSP studies. An energy function is developed which maintains an effective dominance of two important properties of amino acids, namely, hydrophobicity and hydrophilicity. Further, rather than a random individual generation technique, we propose an efficient technique that dynamically generates individuals. This technique is not only fast but it also ensures that the diversity of population is maintained. Moreover, highlighting the limitation of current techniques (absolute and relative) for encoding proteins, we develop a generalised non-isomorphic technique which can be applied to any lattice structure (2D or 3D lattice, FCC or triangular). The capability of MA is increased by incorporating two novel local search techniques based on mutation and pull moves. Memes, a concept similar to that proposed first by Dawkins in cultural evolution, are introduced as substructures that contain domain-specific knowledge which can be passed on to other individuals. At the core of the proposed MA design is the powerful and novel local search technique based on these memes. We also prove theoretically as to how these memes survive while continually undergoing the crossover and mutation operations. With complete meme identification, validation and transfer technique integrated within the proposed MA, the overall approach is referred to as cultural memetic algorithm (CMA). Not only is the PSP problem NP hard, it is also a complex multimodal problem. As all EAs have the tendency to get trapped into local minima due to genetic drift, techniques are necessary to surmount this difficulty. We propose a novel scalable niching technique, called Divide-and-Conquer Niching (DCN), that preserves and utilises the global domain information from multiple peaks (i.e., basins of attraction) and guides the search quickly to the best basin of attraction. Since the proposed niching technique is scalable in nature, we have developed a suitable architecture to enable running the DCN in parallel on a cluster. To cope with the complexity posed by long protein sequences, two effective techniques for mutation and crossover are developed, namely, conflict-resolution-based-crossover and conflict-resolution-based-mutation. Both the techniques resolve overlapping of the sequence with minimum possible changes. These techniques are deployed within the DCN on CMA to speed up convergence for long protein sequences. Experimental results show that DCN on CMA achieves optimum energy for various benchmark sequences studied on four widely used lattice models (i.e., 2D square, 3D cubic, 2D triangular and 3D FCC) and performs more reliably compared to major state of the art algorithms.

Full Text