Abstract

The field of protein-analysis is a major research area for bioinformatics. Especially the field of predicting important sites in proteins is in the focus of research to reduce the cost and time involved in the experimental approach of protein-analysis. Due to our success with theoretical approaches for detecting horizontal gene transfer we decided to use a similar approach for the problem of predicting important residues in a protein chain. To be able to have an efficient predictor, classifiers are needed to separate the important protein residues from the rest of the protein chain. Developing and refining two classifiers is the topic of this thesis. The first classifier is based on information theory and uses the concept of entropy and mutual information to rate protein residues. We use multiple sequence alignments to calculate the entropy of a residue pair and its mutual information. This is an indicator for the correlation between these two residues and thus an indicator for co-evolution. Through statistical means, we identify residues that have significant entropy values under the aspect of coevolution. By using a threshold, the top rated residues are classified as important sites of the protein. This classifier is very successful in detecting Single Nucleotide Polymorphism. The second classifier is based on the distribution of amino acids in a protein and focuses on detecting protein interfaces by using concepts from machine learning. Based upon existing data we analyze the neighborhood of known interface residues and use a machine learning algorithm to create a hypothesis. This hypothesis is then used to predict interface residues on a selected protein chain. This classifier has a very good accuracy and the focus can be easily adjusted to fit variable approaches to protein-analysis. These two classifiers offer a good base for predicting important protein sites and show promising results in experiments. Due to the theoretical concepts involved they can be easily adapted for other analytical purposes as well.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call