A Unified Multitask Architecture for Predicting Local Protein Properties

Yanjun Qi,Merja Oja,Jason Weston,William Stafford Noble

doi:10.1371/journal.pone.0032235

Yanjun Qi, Merja Oja + Show 2 more

Open Access

https://doi.org/10.1371/journal.pone.0032235

Copy DOI

Journal: PLoS ONE	Publication Date: Mar 26, 2012
Citations: 105	License type: CC BY 4.0

Affiliation: University of Washington, Google (United States)

Abstract

A variety of functionally important protein properties, such as secondary structure, transmembrane topology and solvent accessibility, can be encoded as a labeling of amino acids. Indeed, the prediction of such properties from the primary amino acid sequence is one of the core projects of computational biology. Accordingly, a panoply of approaches have been developed for predicting such properties; however, most such approaches focus on solving a single task at a time. Motivated by recent, successful work in natural language processing, we propose to use multitask learning to train a single, joint model that exploits the dependencies among these various labeling tasks. We describe a deep neural network architecture that, given a protein sequence, outputs a host of predicted local properties, including secondary structure, solvent accessibility, transmembrane topology, signal peptides and DNA-binding residues. The network is trained jointly on all these tasks in a supervised fashion, augmented with a novel form of semi-supervised learning in which the model is trained to distinguish between local patterns from natural and synthetic protein sequences. The task-independent architecture of the network obviates the need for task-specific feature engineering. We demonstrate that, for all of the tasks that we considered, our approach leads to statistically significant improvements in performance, relative to a single task neural network approach, and that the resulting model achieves state-of-the-art performance.

Highlights

Proteins participate in every major biological process within every living cell
We focus on predicting local functional properties, which can be summarized as a labeling of amino acids
Training for this task is achieved by assigning a positive label to genuine fragments of natural language, and negative labels to fragments that have been synthetically generated. This task involves learning to predict whether the given text sequence exists naturally in the English languague. Motivated by this language model, we propose an auxiliary task aiming to model the local patterns of amino acids that naturally occur in protein sequences

Summary

Introduction

Proteins participate in every major biological process within every living cell. elucidating protein function is a central endeavor of molecular biology. Analogous to functional labeling of amino acids, natural language can be annotated with tags indicating synonymous pairs of words, parts of speech, larger syntactic entities, named entities, semantic roles, etc. These labelings exhibit strong dependencies across tasks. Essential to the success of the Collobert and Weston system is the use of a deep neural network [3,4], which is able to learn a hierarchy of features that are relevant to the tasks at hand given very basic inputs. Our work makes use of all three of these components–multitask learning, deep learning and an analog of the language model–to predict local protein properties

Objectives

Methods

Results

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A Unified Multitask Architecture for Predicting Local Protein Properties

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLoS ONE

Lead the way for us

Similar Papers

Accurate prediction of protein secondary structure and solvent accessibility by consensus combiners of sequence and structure information.
Gianluca Pollastri ... Alessandro Vullo
BMC Bioinformatics | VOL. 8
Gianluca Pollastri, et. al.Gianluca Pollastri ... Alessandro Vullo
14 Jun 2007
BMC Bioinformatics | VOL. 8

Analysis and Prediction of RNA-Binding Residues Using Sequence, Evolutionary Conservation, and Predicted Secondary Structure and Solvent Accessibility
Tuo Zhang ... Ke Chen
Current Protein & Peptide Science | VOL. 11
Tuo Zhang, et. al.Tuo Zhang ... Ke Chen
01 Nov 2010
Current Protein & Peptide Science | VOL. 11

Genomic mutations and changes in protein secondary structure and solvent accessibility of SARS-CoV-2 (COVID-19 virus)
Thanh Thi Nguyen ... Douglas Creighton
Scientific Reports | VOL. 11
Thanh Thi Nguyen, et. al.Thanh Thi Nguyen ... Douglas Creighton
10 Feb 2021
Scientific Reports | VOL. 11

Porter, PaleAle 4.0: high-accuracy prediction of protein secondary structure and relative solvent accessibility
Claudio Mirabello ... Gianluca Pollastri
Bioinformatics | VOL. 29
Claudio Mirabello, et. al.Claudio Mirabello ... Gianluca Pollastri
14 Jun 2013
Bioinformatics | VOL. 29

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Unified Multitask Architecture for Predicting Local Protein Properties

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLoS ONE