Abstract
BackgroundPrediction of protein structural classes (α, β, α + β and α/β) from amino acid sequences is of great importance, as it is beneficial to study protein function, regulation and interactions. Many methods have been developed for high-homology protein sequences, and the prediction accuracies can achieve up to 90%. However, for low-homology sequences whose average pairwise sequence identity lies between 20% and 40%, they perform relatively poorly, yielding the prediction accuracy often below 60%.ResultsWe propose a new method to predict protein structural classes on the basis of features extracted from the predicted secondary structures of proteins rather than directly from their amino acid sequences. It first uses PSIPRED to predict the secondary structure for each protein sequence. Then, the chaos game representation is employed to represent the predicted secondary structure as two time series, from which we generate a comprehensive set of 24 features using recurrence quantification analysis, K-string based information entropy and segment-based analysis. The resulting feature vectors are finally fed into a simple yet powerful Fisher's discriminant algorithm for the prediction of protein structural classes. We tested the proposed method on three benchmark datasets in low homology and achieved the overall prediction accuracies of 82.9%, 83.1% and 81.3%, respectively. Comparisons with ten existing methods showed that our method consistently performs better for all the tested datasets and the overall accuracy improvements range from 2.3% to 27.5%. A web server that implements the proposed method is freely available at http://www1.spms.ntu.edu.sg/~chenxin/RKS_PPSC/.ConclusionThe high prediction accuracy achieved by our proposed method is attributed to the design of a comprehensive feature set on the predicted secondary structure sequences, which is capable of characterizing the sequence order information, local interactions of the secondary structural elements, and spacial arrangements of α helices and β strands. Thus, it is a valuable method to predict protein structural classes particularly for low-homology amino acid sequences.
Highlights
Prediction of protein structural classes (a, b, a + b and a/b) from amino acid sequences is of great importance, as it is beneficial to study protein function, regulation and interactions
The chaos game representation is employed to represent a predicted secondary structure as two time series, from which we generate a comprehensive set of 24 features using recurrence quantification analysis, K-string based information entropy and segmentbased analysis
The recurrence quantification analysis aims to capture the sequence order information of the time series [17], the K-string based information entropy to reflect certain local interactions along the secondary structure [18], and the segment-based analysis to characterize the spacial arrangements of a helices and b strands
Summary
Prediction of protein structural classes (a, b, a + b and a/b) from amino acid sequences is of great importance, as it is beneficial to study protein function, regulation and interactions. The tertiary structure can be broadly categorized into four structural classes based on the types and arrangements of their secondary structural elements [2] They are the a class in which proteins contain mainly helices, the b class containing mainly strands, and the other two classes with a mixture of a helices and b strands the a + b class having b strands mainly antiparallel and the a/b class having b strands mainly parallel. It is of great value to predict protein structural classes as it is beneficial to study protein function, regulation and interactions. The searching scope of conformation will be significantly reduced for proteins whose structural classes are known [3]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.