Abstract

We propose a protein secondary structure prediction method based on position-specific scoring matrix (PSSM) profiles and four physicochemical features including conformation parameters, net charges, hydrophobic, and side chain mass. First, the SVM with the optimal window size and the optimal parameters of the kernel function is found. Then, we train the SVM using the PSSM profiles generated from PSI-BLAST and the physicochemical features extracted from the CB513 data set. Finally, we use the filter to refine the predicted results from the trained SVM. For all the performance measures of our method, Q 3 reaches 79.52, SOV94 reaches 86.10, and SOV99 reaches 74.60; all the measures are higher than those of the SVMpsi method and the SVMfreq method. This validates that considering these physicochemical features in predicting protein secondary structure would exhibit better performances.

Highlights

  • Many issues on molecular biology have been addressed in the past decades, including genetics, structural biology, and drug design

  • A protein primary sequence is composed of amino acids; as we know, totally 20 different kinds of amino acids can be found in protein sequences

  • Both the PHD and SVMfreq methods are based on the frequency profiles with multiple sequence alignment; the classifier used in the PHD method is a neural network whereas the classifier used in the SVMfreq method is a support vector machine

Read more

Summary

Introduction

Many issues on molecular biology have been addressed in the past decades, including genetics, structural biology, and drug design. A protein primary sequence is composed of amino acids; as we know, totally 20 different kinds of amino acids can be found in protein sequences. We would investigate protein secondary structures based on protein sequences. The secondary structure of a protein sequence comes from different folding of amino acids, due to the differences of their side chain sizes, shapes, reactivity, and the ability to form hydrogen bonds. Owing to the differences of the side chain sizes, the number of electric charges, coupled with the affinity for water, the tertiary structures of protein sequences are not all the same. The exploration of molecular structures on protein sequences is divided into secondary, tertiary, and even quaternary structures. Given a protein primary sequence, its corresponding secondary structure can be revealed as follows: Primary sequence: MFKVYGYDSNIHKCVYCDNAKRLLTVKKQPFEFINIMPEKGV

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call