Abstract

5-methylcytosine (m5C) is a common nucleobase modification, and recent investigations have indicated its prevalence in cellular RNAs including mRNA, tRNA and rRNA. With the rapid accumulation of m5C sites data, it becomes not only feasible but also important to build an accurate model to predict m5C sites in silico. For this purpose, here, we developed a web-server named RNAm5Cfinder based on RNA sequence features and machine learning method to predict RNA m5C sites in eight tissue/cell types from mouse and human. We confirmed the accuracy and usefulness of RNAm5Cfinder by independent tests, and the results show that the comprehensive and cell-specific predictors could pinpoint the generic or tissue-specific m5C sites with the Area Under Curve (AUC) no less than 0.77 and 0.87, respectively. RNAm5Cfinder web-server is freely available at http://www.rnanut.net/rnam5cfinder.

Highlights

  • We found two available online servers for predicting RNA m5C sites which are iRNA-PseColl developed by Feng et al and M5C-HPCR developed by Zhang et al.[13,14]

  • We further applied tissue-specific training and independent test sets where RNA m5C modification data was came from experiments on single tissue or cell to test and benchmark the tissue-specific m5C predictors (Table 1)

  • To train the machine learning model, the RNA sequence flanking the modified/ non-modified sites should be translated to the numeric feature encoding

Read more

Summary

Introduction

Compared its performance with other state-of-the-art published web servers for predicting RNA m5C sites on the same independent test set. As for the strategy for coding RNA sequence, RNAm5Cfinder adopted one-hot encoding and by trying to re-train our predictor with Feng’s coding strategy and found that the performance was slightly reduced (Fig. 2), indicating that one-hot encoding is at least comparable to the current stateof-art method for RNA m5C site prediction. Taking into account the modification spectrum in different cell types or tissues are not the same, one comprehensive predictor can not accurately predict the m5C sites from each specific tissue or cell type.

Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call