Abstract

BackgroundThis study is motivated by the following three considerations: a) the physico-chemical properties of transmembrane (TM) proteins are distinctly different from those of globular proteins, necessitating the development of specialized structure prediction techniques, b) for many structural features no specialized predictors for TM proteins are available at all, and c) deep learning algorithms allow to automate the feature engineering process and thus facilitate the development of multi-target methods for predicting several protein properties at once.ResultsWe present AllesTM, an integrated tool to predict almost all structural features of transmembrane proteins that can be extracted from atomic coordinate data. It blends several machine learning algorithms: random forests and gradient boosting machines, convolutional neural networks in their original form as well as those enhanced by dilated convolutions and residual connections, and, finally, long short-term memory architectures. AllesTM outperforms other available methods in predicting residue depth in the membrane, flexibility, topology, relative solvent accessibility in its bound state, while in torsion angles, secondary structure and monomer relative solvent accessibility prediction it lags only slightly behind the currently leading technique SPOT-1D. High accuracy on a multitude of prediction targets and easy installation make AllesTM a one-stop shop for many typical problems in the structural bioinformatics of transmembrane proteins.ConclusionsIn addition to presenting a highly accurate prediction method and eliminating the need to install and maintain many different software tools, we also provide a comprehensive overview of the impact of different machine learning algorithms and parameter choices on the prediction performance.AllesTM is freely available at https://github.com/phngs/allestm.

Highlights

  • Starting with the seminal work of Qian and Sejnowski [1], only the sky seems to be the limit for the application of machine learning methods to sequence-based protein structure prediction

  • In addition to presenting a highly accurate prediction method and eliminating the need to install and maintain many different software tools, we provide a comprehensive overview of the impact of different machine learning algorithms and parameter choices on the prediction performance

  • While the z-coordinates are the only target where no other method is publicly available for comparison, several conclusions regarding the employed machine learning algorithm can be drawn from the performance overview shown in S1 Table

Read more

Summary

Introduction

Starting with the seminal work of Qian and Sejnowski [1], only the sky seems to be the limit for the application of machine learning methods to sequence-based protein structure prediction. One specific advantage of this group of methods is that they allow automating the feature engineering process and eliminate, or at least significantly alleviate, the arguably most time-consuming step in the development of bioinformatics prediction algorithms. This, in its turn, opens up the possibility of developing multi-target prediction methods, i.e. methods that predict a whole array of protein properties directly from input sequences or evolutionary profiles. This study is motivated by the following three considerations: a) the physico-chemical properties of transmembrane (TM) proteins are distinctly different from those of globular proteins, necessitating the development of specialized structure prediction techniques, b) for many structural features no specialized predictors for TM proteins are available at all, and c) deep learning algorithms allow to automate the feature engineering process and facilitate the development of multi-target methods for predicting several protein properties at once

Methods
Results
Discussion
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.