Abstract

BackgroundAs one of the most common post-transcriptional modifications (PTCM) in RNA, 5-cytosine-methylation plays important roles in many biological functions such as RNA metabolism and cell fate decision. Through accurate identification of 5-methylcytosine (m5C) sites on RNA, researchers can better understand the exact role of 5-cytosine-methylation in these biological functions. In recent years, computational methods of predicting m5C sites have attracted lots of interests because of its efficiency and low-cost. However, both the accuracy and efficiency of these methods are not satisfactory yet and need further improvement.ResultsIn this work, we have developed a new computational method, m5CPred-SVM, to identify m5C sites in three species, H. sapiens, M. musculus and A. thaliana. To build this model, we first collected benchmark datasets following three recently published methods. Then, six types of sequence-based features were generated based on RNA segments and the sequential forward feature selection strategy was used to obtain the optimal feature subset. After that, the performance of models based on different learning algorithms were compared, and the model based on the support vector machine provided the highest prediction accuracy. Finally, our proposed method, m5CPred-SVM was compared with several existing methods, and the result showed that m5CPred-SVM offered substantially higher prediction accuracy than previously published methods. It is expected that our method, m5CPred-SVM, can become a useful tool for accurate identification of m5C sites.ConclusionIn this study, by introducing position-specific propensity related features, we built a new model, m5CPred-SVM, to predict RNA m5C sites of three different species. The result shows that our model outperformed the existing state-of-art models. Our model is available for users through a web server at https://zhulab.ahu.edu.cn/m5CPred-SVM.

Highlights

  • As one of the most common post-transcriptional modifications (PTCM) in RNA, 5-cytosine-methylation plays important roles in many biological functions such as RNA metabolism and cell fate decision

  • Performance of each type of feature By using support vector machine (SVM) over the ten folds cross-validation, we have evaluated the performances of the six types of extracted features for the three species, namely H. sapiens, M. musculus and A. thaliana

  • As for A. thaliana, Table 4 shows that the top three models with the best performances were based on Pseudo dinucleotide composition (PseDNC), 4NF and Chemical Property with Density (CPD), and the KS BC Sn (%) Sp (%) Pre (%) Acc (%) Matthews correlation coefficient (Mcc) F1score Area under the curve (AUC)

Read more

Summary

Introduction

As one of the most common post-transcriptional modifications (PTCM) in RNA, 5-cytosine-methylation plays important roles in many biological functions such as RNA metabolism and cell fate decision. Accurate identification of m5C sites in RNA is of great importance for understanding the mechanism and function of this modification Both experimental and computational methods have been developed to determine and predict m5C sites in RNA. Experimental methods such as bisulfite sequencing [5, 12], m5C-RIP [15], Aza-IP [16], mi-CLIP [17] and RBS-seq [18] have been somewhat successful in identification of m5C sites in RNAs of different species. Computational methods can be able to provide a faster and more cost-effective way for m5C site identification

Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call