Abstract

BackgroundCircadian rhythms regulate several physiological and developmental processes of plants. Hence, the identification of genes with the underlying circadian rhythmic features is pivotal. Though computational methods have been developed for the identification of circadian genes, all these methods are based on gene expression datasets. In other words, we failed to search any sequence-based model, and that motivated us to deploy the present computational method to identify the proteins encoded by the circadian genes.ResultsSupport vector machine (SVM) with seven kernels, i.e., linear, polynomial, radial, sigmoid, hyperbolic, Bessel and Laplace was utilized for prediction by employing compositional, transitional and physico-chemical features. Higher accuracy of 62.48% was achieved with the Laplace kernel, following the fivefold cross- validation approach. The developed model further secured 62.96% accuracy with an independent dataset. The SVM also outperformed other state-of-art machine learning algorithms, i.e., Random Forest, Bagging, AdaBoost, XGBoost and LASSO. We also performed proteome-wide identification of circadian proteins in two cereal crops namely, Oryza sativa and Sorghum bicolor, followed by the functional annotation of the predicted circadian proteins with Gene Ontology (GO) terms.ConclusionsTo the best of our knowledge, this is the first computational method to identify the circadian genes with the sequence data. Based on the proposed method, we have developed an R-package PredCRG (https://cran.r-project.org/web/packages/PredCRG/index.html) for the scientific community for proteome-wide identification of circadian genes. The present study supplements the existing computational methods as well as wet-lab experiments for the recognition of circadian genes.

Highlights

  • Circadian rhythms regulate several physiological and developmental processes of plants

  • We have developed an R-package for easy prediction of circadian genes (CRGs) by using the proteome-wide sequence data

  • Prediction analysis with different sequence length category Prediction was performed with the full dataset and subdatasets, where 50% randomly drawn observations from both CRG and non-CRG classes were utilized

Read more

Summary

Introduction

Circadian rhythms regulate several physiological and developmental processes of plants. Though computational methods have been developed for the identification of circadian genes, all these methods are based on gene expression datasets. We failed to search any sequence-based model, and that motivated us to deploy the present computational method to identify the proteins encoded by the circadian genes. The roles of the circadian system in regulating plant response to different biotic and abiotic stresses have been well studied [23, 24]. Plant growth and development related metabolisms are regulated by CC, where it affects the quality and productivity of crops by bringing changes in the metabolites [25, 26]. As reported in earlier studies [32, 33], crop productivity can be enhanced by manipulating the CC, through circadian up-regulation of photosynthetic carbon assimilation

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call