Abstract

The rise of high-throughput experiments has transformed how scientists approach biological questions. The ubiquity of large-scale assays that can test thousands of samples in a day has necessitated the development of new computational approaches to interpret this data. Among these tools, machine learning approaches are increasingly being utilized due to their ability to infer complex nonlinear patterns from high-dimensional data. Despite their effectiveness, machine learning (and in particular deep learning) approaches are not always accessible or easy to implement for those with limited computational expertise. Here we present PARROT, a general framework for training and applying deep learning-based predictors on large protein datasets. Using an internal recurrent neural network architecture, PARROT is capable of tackling both classification and regression tasks while only requiring raw protein sequences as input. We showcase the potential uses of PARROT on three diverse machine learning tasks: predicting phosphorylation sites, predicting transcriptional activation function of peptides generated by high-throughput reporter assays, and predicting the fibrillization propensity of amyloid beta with data generated by deep mutational scanning. Through these examples, we demonstrate that PARROT is easy to use, performs comparably to state-of-the-art computational tools, and is applicable for a wide array of biological problems.

Highlights

  • The past decade has seen an exponential increase in the rate at which biological data is generated [1]

  • Our motivation behind PARROT was to develop a powerful deep learning tool that is easy to implement into any large-scale protein analysis workflows (>1,000s of sequences) (Figure 1A)

  • While the approach taken by Erijman et al was computationally rigorous and successful, here we demonstrate that PARROT could readily be implemented in such a workflow without sacrificing performance (Figure 3A)

Read more

Summary

Introduction

The past decade has seen an exponential increase in the rate at which biological data is generated [1]. It is not uncommon to find deep mutational scanning (DMS) experiments that achieve nearly complete sequence coverage, or assays that test tens of thousands of peptides [3,4,5,6,7,8,9,10]. This abundance of data being generated has the potential to answer important biological questions, at the same time it significantly complicates experimental analysis

Objectives
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.