Abstract

RNA editing is a post-transcriptional alteration that changes the RNA sequence by insertions, deletions and base substitutions. These changes can directly affect protein expression and structure. In humans, the most common RNA editing is adenosine to inosine (A-to-I editing) but in next-generation sequencing inosine is read as guanine (A-to-G) and mostly found in Alu regions. With this information, several algorithms and software tools have been developed for identifying RNA editing sites. In this paper, we present TAE-ML, a random forest model for detecting RNA editing sites from SNVs in a VCF file called from an RNA-Seq experiment. TAE-ML was integrated with multiple filters to remove artificial variants and incorporated the surrounding sequence of SNVs to accurately identify RNA editing sites. The YH dataset from previous research was used for training the model. The Hela, CH24T, and CH62T datasets were used for testing. Compared with RED-ML and RDDpred with the same candidate sites, TAE-ML got a higher accuracy and F1 score than both tools. These results suggested that our model is capable of predicting true RNA editing sites. We designed TAE-ML to be a simple python script to make it easy to run on various operating systems that support python.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.