The epitranscriptome, also known as chemical modifications of RNA (CMRs), is a newly discovered layer of gene regulation, the biological importance of which emerged through analysis of only a small fraction of CMRs detected by high-throughput sequencing technologies. Understanding of the epitranscriptome is hampered by the absence of computational tools for the systematic analysis of epitranscriptome sequencing data. In addition, no tools have yet been designed for accurate prediction of CMRs in plants, or to extend epitranscriptome analysis from a fraction of the transcriptome to its entirety. Here, we introduce PEA, an integrated R toolkit to facilitate the analysis of plant epitranscriptome data. The PEA toolkit contains a comprehensive collection of functions required for read mapping, CMR calling, motif scanning and discovery and gene functional enrichment analysis. PEA also takes advantage of machine learning (ML) technologies for transcriptome-scale CMR prediction, with high prediction accuracy, using the Positive Samples Only Learning algorithm, which addresses the two-class classification problem by using only positive samples (CMRs), in the absence of negative samples (non-CMRs). Hence PEA is a versatile epitranscriptome analysis pipeline covering CMR calling, prediction and annotation and we describe its application to predict N6-methyladenosine (m6A) modifications in Arabidopsis thaliana. Experimental results demonstrate that the toolkit achieved 71.6% sensitivity and 73.7% specificity, which is superior to existing m6A predictors. PEA is potentially broadly applicable to the in-depth study of epitranscriptomics. PEA Docker image is available at https://hub.docker.com/r/malab/pea, source codes and user manual are available at https://github.com/cma2015/PEA. Supplementary data are available at Bioinformatics online.
Read full abstract