RNA editing is a post-transcriptional RNA sequence modification that alters the mature RNA sequence from its template DNA sequences. RNA editing events are critical in various biological and biochemical mechanisms, and can expand the transcriptomic and proteomic diversity from altered gene regulation to mutations. A-to-I RNA editing is now being vastly detected and quantified on a global scale and gained much attention. A deeper understanding of this process with insufficient genomic annotations and prior knowledge-based filtering steps, such as high-throughput next-generation sequencing techniques, are needed, in addition to data regarding whether an editing location is located in one of the following three main classes of RNA editing sites: ALU in Alu repetitive elements, REP in non-Alu repetitive elements, and NONREP in non-repetitive regions. This study proposes deep learning approaches to ameliorate these issues by learning motif patterns and identifying regions of A-to-I editing events in biological sequences. Using datasets derived from the public REDI portal, 300,000 editing sites were equally divided and used for learning sequence and structure motifs visualized by convolutional kernels. We explored the RNA editing pattern changes using information of positional and class enrichment of learned motifs in the three aforementioned classes. We demonstrated that these newly investigated approaches using large-scale RNA sequencing data offer excellent classification accuracy with a well-optimized convolutional neural network and recurrent neural network classifiers that obtained average area under curves of 0.960 and 0.962, respectively. The findings further decipher the principles underlying RNA editing events and will facilitate more effective RNA sequencing research.
Read full abstract