PCI-SS: MISO dynamic nonlinear protein secondary structure prediction

James R Green,Mohammed O Aboul-Magd,Michael J Korenberg

doi:10.1186/1471-2105-10-222

James R Green, Mohammed O Aboul-Magd + Show 1 more

Open Access

https://doi.org/10.1186/1471-2105-10-222

Copy DOI

Journal: BMC Bioinformatics	Publication Date: Jul 17, 2009
Citations: 54	License type: CC BY 2.0

Affiliation: Carleton University, Queen's University

Abstract

BackgroundSince the function of a protein is largely dictated by its three dimensional configuration, determining a protein's structure is of fundamental importance to biology. Here we report on a novel approach to determining the one dimensional secondary structure of proteins (distinguishing α-helices, β-strands, and non-regular structures) from primary sequence data which makes use of Parallel Cascade Identification (PCI), a powerful technique from the field of nonlinear system identification.ResultsUsing PSI-BLAST divergent evolutionary profiles as input data, dynamic nonlinear systems are built through a black-box approach to model the process of protein folding. Genetic algorithms (GAs) are applied in order to optimize the architectural parameters of the PCI models. The three-state prediction problem is broken down into a combination of three binary sub-problems and protein structure classifiers are built using 2 layers of PCI classifiers. Careful construction of the optimization, training, and test datasets ensures that no homology exists between any training and testing data. A detailed comparison between PCI and 9 contemporary methods is provided over a set of 125 new protein chains guaranteed to be dissimilar to all training data. Unlike other secondary structure prediction methods, here a web service is developed to provide both human- and machine-readable interfaces to PCI-based protein secondary structure prediction. This server, called PCI-SS, is available at . In addition to a dynamic PHP-generated web interface for humans, a Simple Object Access Protocol (SOAP) interface is added to permit invocation of the PCI-SS service remotely. This machine-readable interface facilitates incorporation of PCI-SS into multi-faceted systems biology analysis pipelines requiring protein secondary structure information, and greatly simplifies high-throughput analyses. XML is used to represent the input protein sequence data and also to encode the resulting structure prediction in a machine-readable format. To our knowledge, this represents the only publicly available SOAP-interface for a protein secondary structure prediction service with published WSDL interface definition.ConclusionRelative to the 9 contemporary methods included in the comparison cascaded PCI classifiers perform well, however PCI finds greatest application as a consensus classifier. When PCI is used to combine a sequence-to-structure PCI-based classifier with the current leading ANN-based method, PSIPRED, the overall error rate (Q3) is maintained while the rate of occurrence of a particularly detrimental error is reduced by up to 25%. This improvement in BAD score, combined with the machine-readable SOAP web service interface makes PCI-SS particularly useful for inclusion in a tertiary structure prediction pipeline.

Highlights

Since the function of a protein is largely dictated by its three dimensional configuration, determining a protein's structure is of fundamental importance to biology
A wide variety of methods have been applied to this problem including those based on artificial neural networks (ANNs) [3,4,5,6,7,8], hidden Markov models (HMMs) [8,9], information theory [5], linear programming [10], and linear discriminant analysis (LDA) [5], no method has achieved the theoretical maximum predictive Q3 accuracy of 88% [2]
PSIPRED-local refers to the output of PSIPRED v2.45 run locally when provided with position-specific scoring matrices (PSSMs) data generated from the filtered NCBI non-redundant nr database as frozen on 3 May 2004

Summary

Introduction

Since the function of a protein is largely dictated by its three dimensional configuration, determining a protein's structure is of fundamental importance to biology. We report on a novel approach to determining the one dimensional secondary structure of proteins (distinguishing α-helices, β-strands, and nonregular structures) from primary sequence data which makes use of Parallel Cascade Identification (PCI), a powerful technique from the field of nonlinear system identification. Computational prediction techniques provide an attractive alternative; the accurate prediction of 3D protein structure directly from amino acid sequence data continues to elude researchers when homologous protein structures are not available (comparative modeling), or for longer domains (de novo modeling). As an intermediate but useful step, attempts have been made to determine the one dimensional secondary structure of proteins (distinguishing α-helices, β-strands, and non-regular structure) from primary sequence data [2]. Note that this study focuses on predicting secondary structure of globular proteins. Excluded proteins include those with coiled-coil regions or transmembrane domains

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

PCI-SS: MISO dynamic nonlinear protein secondary structure prediction

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

PCI-SS: Web-based human and machine interfaces for protein secondary structure prediction
Mohammed Aboul-Magd ... James R Green
-
Mohammed Aboul-Magd, et. al.Mohammed Aboul-Magd ... James R Green
01 May 2008
01 May 2008

Data Mining for Protein Secondary Structure Prediction
Haitao Cheng ... Robert L Jernigan
-
Haitao Cheng, et. al.Haitao Cheng ... Robert L Jernigan
01 Jan 2009
01 Jan 2009

Prediction of protein secondary structure based on an improved channel attention and multiscale convolution module.
Xin Jin ... Shaowen Yao
Frontiers in Bioengineering and Biotechnology | VOL. 10
Xin Jin, et. al.Xin Jin ... Shaowen Yao
22 Jul 2022
Frontiers in Bioengineering and Biotechnology | VOL. 10

Prediction of protein secondary structure using large margin nearest neighbor classification
Wei Yang ... Kuanquan Wang
-
Wei Yang, et. al. Wei Yang ... Kuanquan Wang
01 Jan 2010
01 Jan 2010

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

PCI-SS: MISO dynamic nonlinear protein secondary structure prediction

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics