A random forest learning assisted \u201cdivide and conquer\u201d approach for peptide conformation search

Xin Chen,Bing Yang,Zijing Lin

doi:10.1038/s41598-018-27167-w

Xin Chen, Bing Yang + Show 1 more

Open Access

https://doi.org/10.1038/s41598-018-27167-w

Copy DOI

Abstract

Computational determination of peptide conformations is challenging as it is a problem of finding minima in a high-dimensional space. The “divide and conquer” approach is promising for reliably reducing the search space size. A random forest learning model is proposed here to expand the scope of applicability of the “divide and conquer” approach. A random forest classification algorithm is used to characterize the distributions of the backbone φ-ψ units (“words”). A random forest supervised learning model is developed to analyze the combinations of the φ-ψ units (“grammar”). It is found that amino acid residues may be grouped as equivalent “words”, while the φ-ψ combinations in low-energy peptide conformations follow a distinct “grammar”. The finding of equivalent words empowers the “divide and conquer” method with the flexibility of fragment substitution. The learnt grammar is used to improve the efficiency of the “divide and conquer” method by removing unfavorable φ-ψ combinations without the need of dedicated human effort. The machine learning assisted search method is illustrated by efficiently searching the conformations of GGG/AAA/GGGG/AAAA/GGGGG through assembling the structures of GFG/GFGG. Moreover, the computational cost of the new method is shown to increase rather slowly with the peptide length.

Highlights

Structures are the basis for understanding the properties and functions of biomolecules such as peptides and proteins
Based on a random forest classification algorithm and multidimensional scaling (MDS) analysis, it is found that amino acid (AA) residues can be classified into groups according to similarities in their φ-ψ distributions
A random forest supervised learning model is built to analyze the combinations of the φ-ψ units

Summary

Introduction

Structures are the basis for understanding the properties and functions of biomolecules such as peptides and proteins. When benchmarked with the results of the systematic search method, the “divide and conquer” method has been shown to be both efficient and reliable for determining the structures of small peptides[2,15,16]. (2) For numerical efficiency, the number of the low energy fragment conformations used for forming the trial structures of the peptide should be minimized. This is made possible by a detailed analysis of the structural features to ensure the chosen fragment structures are capable of forming favorable inter-fragment interactions[2,15,16]. Applications to representative peptides show that the new method is efficient and highly reliable as demonstrated by comparing with the systematic search results

Objectives

Methods

Results

Conclusion