Abstract

Functional annotation of protein sequences with low similarity to well characterized protein sequences is a major challenge of computational biology in the post genomic era. The cyclin protein family is once such important family of proteins which consists of sequences with low sequence similarity making discovery of novel cyclins and establishing orthologous relationships amongst the cyclins, a difficult task. The currently identified cyclin motifs and cyclin associated domains do not represent all of the identified and characterized cyclin sequences. We describe a Support Vector Machine (SVM) based classifier, CyclinPred, which can predict cyclin sequences with high efficiency. The SVM classifier was trained with features of selected cyclin and non cyclin protein sequences. The training features of the protein sequences include amino acid composition, dipeptide composition, secondary structure composition and PSI-BLAST generated Position Specific Scoring Matrix (PSSM) profiles. Results obtained from Leave-One-Out cross validation or jackknife test, self consistency and holdout tests prove that the SVM classifier trained with features of PSSM profile was more accurate than the classifiers based on either of the other features alone or hybrids of these features. A cyclin prediction server- CyclinPred has been setup based on SVM model trained with PSSM profiles. CyclinPred prediction results prove that the method may be used as a cyclin prediction tool, complementing conventional cyclin prediction methods.

Highlights

  • Cyclins were first identified in early 1980s in the eggs of marine invertebrates [1,2]

  • The performance of PSI-BLAST was evaluated using jackknife cross validation method, where each sequence in the training dataset was used as a BLAST query sequence and remaining sequences were used as BLAST database

  • Amino Acid Compositions (AAC) based Support Vector Machine (SVM) trained with different kernels have similar accuracies. To check whether these accuracies are due to artifact of the training dataset, we generated another sets of non-cyclin sequences belonging to few non-cyclin families and used them as non-cyclin dataset for training AAC based SVM classifier: we found that each classifier had comparable specificity and sensitivity values

Read more

Summary

Introduction

Cyclins were first identified in early 1980s in the eggs of marine invertebrates [1,2]. Cyclins have been discovered in many organisms [3,4]. Cyclins bind and activate members of the Cdk protein family to regulate the cell cycle. The periodicity of cyclin concentrations during the cell cycle leads to periodic oscillations in Cdk activity that governs the cell cycle control system. Different cyclin-Cdk complexes are activated at different points during the cell cycle [5,6,7,8]. Cyclins have been classified into four general classes based on function and timing of activity namely, G1, G1/S, S and M cyclins. Diverse multiple forms have been discovered, the cyclins were further classified on the basis of amino acid sequence comparisons, such as G1:C, D, E and G2:A, B cyclins and several other classes [9]. Cyclin homologues have been found in various viruses; for example Herpesvirus saimiri and Kaposi’s sarcoma-associated herpesvirus [10]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call