Abstract

RNA 5-hydroxymethylcytosine (5hmC) modification plays an important role in a series of biological processes. Characterization of its distributions in transcriptome is fundamentally important to reveal the biological functions of 5hmC. Sequencing-based technologies allow the high-throughput identification of 5hmC; however, they are labor-intensive, time-consuming, as well as expensive. Thus, there is an urgent need to develop more effective and efficient computational methods, at least complementary to the high-throughput technologies. In this study, we developed iRNA5hmC, a computational predictive protocol to identify RNA 5hmC sites using machine learning. In this predictor, we introduced a sequence-based feature algorithm consisting of two feature representations, (1) k-mer spectrum and (2) positional nucleotide binary vector, to capture the sequential characteristics of 5hmC sites. Afterward, we utilized a two-stage feature space optimization strategy to improve the feature representation ability, and trained a predictive model using support vector machine (SVM). Our feature analysis results showed that feature optimization can help to capture the most discriminative features. As compared to well-known existing feature descriptors, our proposed representations can more accurately separate true 5hmC from non-5hmC sites. To the best of our knowledge, iRNA5hmC is the first RNA 5hmC predictor that enables to make predictions based on RNA primary sequences only, without any need of prior experimental knowledge. Importantly, we have established an easy-to-use webserver which is currently available at http://server.malab.cn/iRNA5hmC. We expect it has potential to be a useful tool for the prediction of 5hmC sites.

Highlights

  • RNA can be decorated by various chemical modifications (Boccaletto et al, 2018)

  • IRNA5hmC is the first RNA 5-hydroxymethylcytosine site predictor, which enables to make predictions based on RNA primary sequences without prior experimental knowledge

  • We compared the performance of the three kernels

Read more

Summary

Introduction

RNA can be decorated by various chemical modifications (Boccaletto et al, 2018). Over the past decades, more than 100 kinds of modifications have been identified in mRNA, tRNA, rRNA, and snRNA, etc. (Shi et al, 2019). It was demonstrated that RNA modifications are associated with human diseases (Jonkhout et al, 2017), including cancer, cardiovascular diseases, Bowen–Conradi syndrome, obesity, and diabetes, etc. Determining their distributions in the transcriptomes is important for decoding the biological and physiological functions of RNA modifications. (Conde et al, 2015; Chen et al, 2019; Pian et al, 2019; Yuan et al, 2019) Another kind of RNA modification, called 5-hydroxymethylcytosine (5hmC) is formed by TET-mediated oxidation of m5C (Fu et al, 2014). Later on, Huber et al (2015) found that 5hmC is pervasive in all three domains of life across a variety of different species

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.