Abstract

Pseudouridine (Ψ) is one of the most abundant RNA modifications existing in ubiquitous organisms and participates a bunch of biological processes. Identification of pseudouridine sites has significant meanings for the study of biological process and drug developments related to RNA. Wet experiments for pseudouridine sites are expensive, time-consuming and easily influenced by the environment although they can detect Ψ sites in whole transcriptome. While some computational methods have been presented, their performances are still unsatisfactory and their computation costs are usually large. In this article, we propose an incremental identification method called PA-PseU which is based on Passive-Aggressive algorithm as Ψ sites classifier. The combination of chi-square test and logistic regression is used to select optimal feature subsets. Its effectiveness has been demonstrated via 10-fold cross-validation, jack-knife test and independent test. The average leave-one-out accuracies of PA-PseU among different species are respectively 87.80% in cross validation sets and 86.70% in independent testing sets, which improve 16.40% and 12.00% respectively, contrasting with 71.40% and 74.70% of the best previous predictor RF-PseU. All results demonstrate that our model outperforms the state-of-art methods by a big margin and improves the previous models substantially. Moreover, PA-PseU is an incremental method which takes low computation cost with rapid computation speed. PA-PseU can be a useful tool for identifying the Ψ sites in transcriptome analysis. All the data, code and related materials are accessible at https://github.com/Jensen-Wang/PA-PseU.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call