Abstract

Background: Pseudouridine (Ψ) is a common ribonucleotide modification that plays a significant role in many biological processes. The identification of Ψ modification sites is of great significance for disease mechanism and biological processes research in which machine learning algorithms are desirable as the lab exploratory techniques are expensive and time-consuming. Results: In this work, we propose a deep learning framework, called PseUdeep, to identify Ψ sites of three species: H. sapiens, S. cerevisiae, and M. musculus. In this method, three encoding methods are used to extract the features of RNA sequences, that is, one-hot encoding, K-tuple nucleotide frequency pattern, and position-specific nucleotide composition. The three feature matrices are convoluted twice and fed into the capsule neural network and bidirectional gated recurrent unit network with a self-attention mechanism for classification. Conclusion: Compared with other state-of-the-art methods, our model gets the highest accuracy of the prediction on the independent testing data set S-200; the accuracy improves 12.38%, and on the independent testing data set H-200, the accuracy improves 0.68%. Moreover, the dimensions of the features we derive from the RNA sequences are only 109,109, and 119 in H. sapiens, M. musculus, and S. cerevisiae, which is much smaller than those used in the traditional algorithms. On evaluation via tenfold cross-validation and two independent testing data sets, PseUdeep outperforms the best traditional machine learning model available. PseUdeep source code and data sets are available at https://github.com/dan111262/PseUdeep.

Highlights

  • Pseudouridine (Ψ) is one of the most prevalent RNA modifications that occurs at the uridinebase through an isomerization reaction catalyzed by pseudouridine synthases (BousquetAntonelli et al, 1997; Chan and Huang, 2009; Ge and Yu, 2013; Kiss et al, 2010; Wolin, 2016; Yu and Meier, 2014)

  • It is confirmed that Ψ modification occurs in several kinds of RNAs, such as small nuclear RNA, rRNA, tRNA, mRNA, and small nucleolar RNA (Ge and Yu, 2013)

  • We propose a model, PseUdeep, which can effectively identify Ψ sites in RNA sequences

Read more

Summary

Introduction

Pseudouridine (Ψ) is one of the most prevalent RNA modifications that occurs at the uridinebase through an isomerization reaction catalyzed by pseudouridine synthases (see Figure 1) (BousquetAntonelli et al, 1997; Chan and Huang, 2009; Ge and Yu, 2013; Kiss et al, 2010; Wolin, 2016; Yu and Meier, 2014). It is confirmed that Ψ modification occurs in several kinds of RNAs, such as small nuclear RNA, rRNA, tRNA, mRNA, and small nucleolar RNA (Ge and Yu, 2013). Ψ plays a RNA Modification, Pseudouridine Site Prediction significant role in many biological processes, including regulating the stability of RNA structure in tRNA and rRNA (Kierzek et al, 2014). The identification of Ψ modification sites would be of great benefit for disease mechanism and biological processes research. Pseudouridine (Ψ) is a common ribonucleotide modification that plays a significant role in many biological processes. The identification of Ψ modification sites is of great significance for disease mechanism and biological processes research in which machine learning algorithms are desirable as the lab exploratory techniques are expensive and time-consuming

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call