Abstract

RNA editing is a post-transcriptional RNA process that provides RNA and protein complexity for regulating gene expression in eukaryotes. It is challenging to predict RNA editing by computational methods. In this study, we developed a novel method to predict RNA editing based on a random forest method. A careful feature selection procedure was performed based on the Maximum Relevance Minimum Redundancy (mRMR) and Incremental Feature Selection (IFS) algorithms. Eighteen optimal features were selected from the 77 features in our dataset and used to construct a final predictor. The accuracy and MCC (Matthews correlation coefficient) values for the training dataset were 0.866 and 0.742, respectively; for the testing dataset, the accuracy and MCC were 0.876 and 0.576, respectively. The performance was higher using 18 features than all 77, suggesting that a small feature set was sufficient to achieve accurate prediction. Analysis of the 18 features was performed and may shed light on the mechanism and dominant factors of RNA editing, providing a basis for future experimental validation.

Highlights

  • RNA editing is a post-transcriptional RNA process that provides RNA and protein complexity for regulating gene expression in eukaryotes

  • Especially primates, the most common form of editing is the conversion of adenosines to inosines, which is catalyzed by a family of enzymes called adenosine deaminase acting on RNA (ADAR) [4]

  • The training dataset was retrieved from the training samples used to build their Random Forest algorithm-based model, RFss1, as well as for training their sparse partial least-squares (SPLS)- and artificial neural network (ANN)-based models

Read more

Summary

Introduction

RNA editing is a post-transcriptional RNA process that provides RNA and protein complexity for regulating gene expression in eukaryotes. Especially primates, the most common form of editing is the conversion of adenosines to inosines, which is catalyzed by a family of enzymes called adenosine deaminase acting on RNA (ADAR) [4]. ADAR enzymes are encoded by three dependent genes, ADAR1-3, which are located on chromosomes 1, 21, and 10, respectively. Hartner and his colleagues knocked out the Adar gene from mice and found that Adar12/2 mice died early during embryonic development due to liver disintegration [5]. In addition to the ADAR family, the apolipoprotein B protein family harbors editing activities, including apolipoprotein B-editing catalytic subunit 1-4 (APOBEC1-4) [7]. Unlike the Adar12/2 mice, Apobec-12/2 mice are healthy and viable [9]

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call