Abstract

RNA modification is an essential step towards generation of new RNA structures. Such modification is potentially able to modify RNA function or its stability. Among different modifications, 5-Hydroxymethylcytosine (5hmC) modification of RNA exhibit significant potential for a series of biological processes. Understanding the distribution of 5hmC in RNA is essential to determine its biological functionality. Although conventional sequencing techniques allow broad identification of 5hmC, they are both time-consuming and resource-intensive. In this study, we propose a new computational tool called iRNA5hmC-PS to tackle this problem. To build iRNA5hmC-PS we extract a set of novel sequence-based features called Position-Specific Gapped k-mer (PSG k-mer) to obtain maximum sequential information. Our feature analysis shows that our proposed PSG k-mer features contain vital information for the identification of 5hmC sites. We also use a group-wise feature importance calculation strategy to select a small subset of features containing maximum discriminative information. Our experimental results demonstrate that iRNA5hmC-PS is able to enhance the prediction performance, dramatically. iRNA5hmC-PS achieves 78.3% prediction performance, which is 12.8% better than those reported in the previous studies. iRNA5hmC-PS is publicly available as an online tool at http://103.109.52.8:81/iRNA5hmC-PS. Its benchmark dataset, source codes, and documentation are available at https://github.com/zahid6454/iRNA5hmC-PS.

Highlights

  • Over the last few decades, a wide variety of RNA-related challenging research problems have been surfaced

  • We believe iRNA5hmC-PS has the potential to be an accurate and efficient tool to identify 5 Hydroxymethylcytosine (5hmC) sites. iRNA5hmC-PS is publicly available as a web-server at http://103.109.52.8:81/iRNA5hmC-PS and benchmark dataset, source codes, and documentation for all the models are available at https://github.com/zahid6454/ iRNA5hmC-PS

  • With our framework consisting of several novel sequence representation modes, a mode-wise feature selector and a predictor, we have significantly outperformed the current state-of-the-art method, which had been established recently

Read more

Summary

Introduction

Over the last few decades, a wide variety of RNA-related challenging research problems have been surfaced. RNA modification is one of the most important and challenging research problems. Tinct RNA modifications have been identified in mRNA, tRNA, rRNA, and snRNA [2,3]. These modifications can affect several biological processes, such as transcription, pre-RNA splicing, RNA export, RNA degradation, and mRNA translation [4,5,6,7]. It is of paramount importance to determine their distribution in transcriptomes to assess the biological functionalities of RNA modifications

Objectives
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.