PDP-CON: prediction of domain/linker residues in protein sequences using a consensus approach.

Piyali Chatterjee,Dariusz Plewczynski,Subhadip Basu,Mita Nasipuri,Julian Zubek,Mahantapas Kundu

doi:10.1007/s00894-016-2933-0

Abstract

The prediction of domain/linker residues in protein sequences is a crucial task in the functional classification of proteins, homology-based protein structure prediction, and high-throughput structural genomics. In this work, a novel consensus-based machine-learning technique was applied for residue-level prediction of the domain/linker annotations in protein sequences using ordered/disordered regions along protein chains and a set of physicochemical properties. Six different classifiers—decision tree, Gaussian naïve Bayes, linear discriminant analysis, support vector machine, random forest, and multilayer perceptron—were exhaustively explored for the residue-level prediction of domain/linker regions. The protein sequences from the curated CATH database were used for training and cross-validation experiments. Test results obtained by applying the developed PDP-CON tool to the mutually exclusive, independent proteins of the CASP-8, CASP-9, and CASP-10 databases are reported. An n-star quality consensus approach was used to combine the results yielded by different classifiers. The average PDP-CON accuracy and F-measure values for the CASP targets were found to be 0.86 and 0.91, respectively. The dataset, source code, and all supplementary materials for this work are available at https://cmaterju.org/cmaterbioinfo/ for noncommercial use.Electronic supplementary materialThe online version of this article (doi:10.1007/s00894-016-2933-0) contains supplementary material, which is available to authorized users.

Highlights

Some simple combinations of protein secondary-structural elements that are found to occur frequently in proteins are referred to as super-secondary structures or motifs
We examined six different machine-learning algorithms using a carefully chosen feature set consisting of a hydrophobicity index, a linker index, polarity values, ordered/ disordered regions in the protein sequence, and flexibility parameters for residue-level protein domain boundary prediction from sequence information
We considered six different types of classifiers: decision tree (DT), Gaussian naïve Bayes (GNB), linear discriminant analysis (LDA), support vector machine (SVM), random forest (RF), and multilayer perceptron (MLP)

Summary

Introduction

Some simple combinations of protein secondary-structural elements that are found to occur frequently in proteins are referred to as super-secondary structures or motifs. Several motifs pack together to form compact, local, semi-independent units called domains. A domain is a segment of a polypeptide chain that can fold into a three-dimensional structure irrespective of the presence of other segments of the chain [1]. The overall 3D structure of a protein’s polypeptide chain is referred to as its tertiary structure, whereas the domain is the fundamental building block of tertiary structure. Each domain contains a hydrophobic core built from secondary-structural units connected by loop regions. Two-thirds of the proteins in unicellular organisms and more than 80 % of those in metazoans are multidomain proteins created as a result of gene duplication events. As the complexity of an organism increases, the number of domains in its proteins increases.

Objectives

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Molecular Modeling	Publication Date: Mar 11, 2016
Citations: 38	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

PDP-CON: prediction of domain/linker residues in protein sequences using a consensus approach.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Molecular Modeling

Lead the way for us

Similar Papers

PiRaNhA: a server for the computational prediction of RNA-binding residues in protein sequences
Y Murakami ... S Jones
Nucleic Acids Research | VOL. 38
Y Murakami, et. al.Y Murakami ... S Jones
27 May 2010
Nucleic Acids Research | VOL. 38

A novel method for predicting RNA-interacting residues in proteins using a combination of feature-based and sequence template-based methods
Jiazhi Song ... Ping Zhang
Biotechnology & Biotechnological Equipment | VOL. 33
Jiazhi Song, et. al.Jiazhi Song ... Ping Zhang
01 Jan 2019
Biotechnology & Biotechnological Equipment | VOL. 33

Predict prokaryotic proteins through detecting N-formylmethionine residues in protein sequences using support vector machine
Zheng Rong Yang
BioSystems | VOL. 97
Zheng Rong YangZheng Rong Yang
08 Jun 2009
BioSystems | VOL. 97

Prediction of domain boundaries in protein sequences using predicted secondary structure and physicochemical properties of amino acids
Srija Chakraborty ... Subhasish Das
-
Srija Chakraborty, et. al.Srija Chakraborty ... Subhasish Das
01 Mar 2014
01 Mar 2014

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

PDP-CON: prediction of domain/linker residues in protein sequences using a consensus approach.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Molecular Modeling