Use of machine learning algorithms to classify binary protein sequences as highly-designable or poorly-designable.

Myron Peto,Andrzej Kloczkowski,Vasant Honavar,Robert L Jernigan

doi:10.1186/1471-2105-9-487

Myron Peto, Andrzej Kloczkowski + Show 2 more

Open Access

https://doi.org/10.1186/1471-2105-9-487

Copy DOI

Journal: BMC bioinformatics	Publication Date: Nov 18, 2008
Citations: 35	License type: CC BY 2.0

Affiliation: Iowa State University

Abstract

BackgroundBy using a standard Support Vector Machine (SVM) with a Sequential Minimal Optimization (SMO) method of training, Naïve Bayes and other machine learning algorithms we are able to distinguish between two classes of protein sequences: those folding to highly-designable conformations, or those folding to poorly- or non-designable conformations.ResultsFirst, we generate all possible compact lattice conformations for the specified shape (a hexagon or a triangle) on the 2D triangular lattice. Then we generate all possible binary hydrophobic/polar (H/P) sequences and by using a specified energy function, thread them through all of these compact conformations. If for a given sequence the lowest energy is obtained for a particular lattice conformation we assume that this sequence folds to that conformation. Highly-designable conformations have many H/P sequences folding to them, while poorly-designable conformations have few or no H/P sequences. We classify sequences as folding to either highly – or poorly-designable conformations. We have randomly selected subsets of the sequences belonging to highly-designable and poorly-designable conformations and used them to train several different standard machine learning algorithms.ConclusionBy using these machine learning algorithms with ten-fold cross-validation we are able to classify the two classes of sequences with high accuracy – in some cases exceeding 95%.

Highlights

By using a standard Support Vector Machine (SVM) with a Sequential Minimal Optimization (SMO) method of training, Naïve Bayes and other machine learning algorithms we are able to distinguish between two classes of protein sequences: those folding to highly-designable conformations, or those folding to poorly- or non-designable conformations
Through the use of complete enumerations of H/P sequences and compact lattice conformations it has been found that most protein sequences fold to a relatively small number of so called "highly-designable" conformations, while the remaining conformations have few, or no, sequences that fold to them [24,25,26,27,28,29,30,31,32,33]
The results obtained for lattice proteins suggest that, as for real proteins, designable conformations tend to exhibit structural symmetries. These findings show that a simple lattice model can demonstrate important traits that are mirrored in real proteins

Summary

Introduction

By using a standard Support Vector Machine (SVM) with a Sequential Minimal Optimization (SMO) method of training, Naïve Bayes and other machine learning algorithms we are able to distinguish between two classes of protein sequences: those folding to highly-designable conformations, or those folding to poorly- or non-designable conformations. In coarse-grained models of proteins a detailed atomistic description of the structure is replaced by a much simpler view where each amino acid is represented by a single point. Theoretical models of proteins frequently replace the 20-letter amino acid alphabet with a reduced alphabet, up to the limit of a much simpler (page number not for citation purposes). Through the use of complete enumerations of H/P sequences and compact lattice conformations it has been found that most protein sequences fold to a relatively small number of so called "highly-designable" conformations, while the remaining conformations have few, or no, sequences that fold to them [24,25,26,27,28,29,30,31,32,33]. In the present work we use a standard H/P alphabet and a 2D triangular lattice and apply machine learning algorithms to study protein designability for such a reduced model

Objectives

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Use of machine learning algorithms to classify binary protein sequences as highly-designable or poorly-designable.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC bioinformatics

Lead the way for us

Similar Papers

The use of machine learning algorithms in recommender systems: A systematic review
Ivens Portugal ... Donald Cowan
Expert Systems with Applications | VOL. 97
Ivens Portugal, et. al.Ivens Portugal ... Donald Cowan
09 Dec 2017
Expert Systems with Applications | VOL. 97

Untersuchungen über die photosynthetische Leistung gelbblättriger Gehölze )
Klaus Michael
Flora oder Allgemeine Botanische Zeitung | VOL. 141
Klaus MichaelKlaus Michael
01 Jan 1953
Flora oder Allgemeine Botanische Zeitung | VOL. 141

Reduced set support vector machines: Application for 2-dimensional datasets
A Hussain ... S A Samad
-
A Hussain, et. al.A Hussain ... S A Samad
01 Dec 2008
01 Dec 2008

Development of hybrid models based on deep learning and optimized machine learning algorithms for brain tumor Multi-Classification
Muhammed Celik ... Ozkan Inik
Expert Systems with Applications | VOL. 238
Muhammed Celik, et. al.Muhammed Celik ... Ozkan Inik
18 Oct 2023
Expert Systems with Applications | VOL. 238

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Use of machine learning algorithms to classify binary protein sequences as highly-designable or poorly-designable.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC bioinformatics