Prediction of protein structural features from sequence data based on Shannon entropy and Kolmogorov complexity.

Robert Paul Bywater,Jose M Sanchez-Ruiz

doi:10.1371/journal.pone.0119306

Robert Paul Bywater, Jose M Sanchez-Ruiz

Open Access

https://doi.org/10.1371/journal.pone.0119306

Copy DOI

Journal: PloS one	Publication Date: Apr 9, 2015
Citations: 34	License type: CC BY 4.0

Affiliation: Northeast Catholic College

Abstract

While the genome for a given organism stores the information necessary for the organism to function and flourish it is the proteins that are encoded by the genome that perhaps more than anything else characterize the phenotype for that organism. It is therefore not surprising that one of the many approaches to understanding and predicting protein folding and properties has come from genomics and more specifically from multiple sequence alignments. In this work I explore ways in which data derived from sequence alignment data can be used to investigate in a predictive way three different aspects of protein structure: secondary structures, inter-residue contacts and the dynamics of switching between different states of the protein. In particular the use of Kolmogorov complexity has identified a novel pathway towards achieving these goals.

Highlights

In order to fulfil their mission proteins have many functions including the need in the first place to fold correctly
Note: there is a slight rightwards displacement or offset because AREA is calculated for residues i and i+2 for each i. The conclusion from these two studies is that VAR, ENT and Kolmogorov complexity (KOL) all correlate with SSE patterns and backbone geometry which suggests ways of using them in a predictive fashion for secondary structures
It is evident just from a perusal of the data presented here that VAR, ENT and KOL reveal essential features related to protein structure, function and dynamics

Summary

Introduction

In order to fulfil their mission proteins have many functions including the need in the first place to fold correctly. Note: there is a slight rightwards displacement or offset because AREA is calculated for residues i and i+2 for each i (that will tend to make the correlations look “weaker”) The conclusion from these two studies is that VAR, ENT and KOL all correlate with SSE patterns and backbone geometry which suggests ways of using them in a predictive fashion for secondary structures. Moving on to considerations of three dimensional structure, a similar behavior is observed for KOL ( for VAR and ENT, but not shown graphically) in synchrony with solvent accessibilities (calculated from crystal structure coordinates using WHAT IF) and B-values (experimental) for these proteins These (OACA/OACI and BVLA/BVLI respectively) are plotted separately for the A and I structures in Fig 2 for the insulin receptor (and figures B, E, H, K, N, Q, T, W, Z in S1 File for the other proteins) with OACA/OACI denoting the accessibilities for A and I respectively and BVLA/BVLI likewise for the B-values. For 3D prediction, KOL seems to be emerging as the method of choice

Concluding remarks

Findings

Methods