Abstract
RNA pseudouridine modification is particularly important in a variety of cellular biological and physiological processes. It plays a significant role in understanding RNA functions, RNA structure stabilization, translation processes, etc. To understand its functional mechanisms, it is necessary to accurately identify pseudouridine sites in RNA sequences. Although some computational methods have been proposed for the identification of pseudouridine sites, it is still a challenge to improve the identification accuracy and generalization ability. To address this challenge, a novel feature fusion predictor, named PsoEL-PseU, is proposed for the prediction of pseudouridine sites. Firstly, this study systematically and comprehensively explored different types of feature descriptors and determined six feature descriptors with various properties. To improve the feature representation ability, a binary particle swarm optimizer was used to capture the optimal feature subset for six feature descriptors. Secondly, six individual predictors were trained by using the six optimal feature subsets. Finally, to fuse the effects of all six features, six individual predictors were fused into an ensemble predictor by a parallel fusion strategy. Ten-fold cross-validation on three benchmark datasets indicated that the PsoEL-PseU predictor significantly outperformed the current state-of-the-art predictors. Additionally, the new predictor achieved better accuracy in the independent dataset evaluation—accuracy which is significantly higher than that of its existing counterparts—and the user-friendly webserver developed by the PsoEL-PseU predictor has been made freely accessible.
Highlights
With the generation of sequencing technology rapidly developing, the identification of RNA pseudouridine sites has gradually become one of the most significant areas in transcriptome research
S. cerevisiae species dataset has a larger feature space, and its performance will clearly be more significantly improved if a more adequate and thorough search is performed to filter out the optimal feature subset using heuristic search methods
The PsoEL-PseU predictor was proposed as a novel feature fusion predictor for the prediction of pseudouridine sites
Summary
With the generation of sequencing technology rapidly developing, the identification of RNA pseudouridine sites has gradually become one of the most significant areas in transcriptome research. Pseudouridine sites are considered to be among the most basic RNA modification sites found in prokaryotes and eukaryotes [2]. As one of the most enriched post-transcriptional modifications, pseudouridylation plays an important role in the structure, function, and metabolism of RNA [3,4,5,6]. The study of pseudouridine modification sites is very important in further revealing their related biological principles: for instance, their role in stress response and in stabilizing. Considering the rapidly increasing amount of data generated in the post-genome era, it is necessary to build computational tools that can identify pseudouridine sites efficiently. Several fast and inexpensive methods for predicting
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have