Abstract

BackgroundPolyadenylation is a critical stage of RNA processing during the formation of mature mRNA, and is present in most of the known eukaryote protein-coding transcripts and many long non-coding RNAs. The correct identification of poly(A) signals (PAS) not only helps to elucidate the 3′-end genomic boundaries of a transcribed DNA region and gene regulatory mechanisms but also gives insight into the multiple transcript isoforms resulting from alternative PAS. Although progress has been made in the in-silico prediction of genomic signals, the recognition of PAS in DNA genomic sequences remains a challenge.ResultsIn this study, we analyzed human genomic DNA sequences for the 12 most common PAS variants. Our analysis has identified a set of features that helps in the recognition of true PAS, which may be involved in the regulation of the polyadenylation process. The proposed features, in combination with a recognition model, resulted in a novel method and tool, Omni-PolyA. Omni-PolyA combines several machine learning techniques such as different classifiers in a tree-like decision structure and genetic algorithms for deriving a robust classification model. We performed a comparison between results obtained by state-of-the-art methods, deep neural networks, and Omni-PolyA. Results show that Omni-PolyA significantly reduced the average classification error rate by 35.37% in the prediction of the 12 considered PAS variants relative to the state-of-the-art results.ConclusionsThe results of our study demonstrate that Omni-PolyA is currently the most accurate model for the prediction of PAS in human and can serve as a useful complement to other PAS recognition methods. Omni-PolyA is publicly available as an online tool accessible at www.cbrc.kaust.edu.sa/omnipolya/.

Highlights

  • Polyadenylation is a critical stage of RNA processing during the formation of mature mRNA, and is present in most of the known eukaryote protein-coding transcripts and many long non-coding RNAs

  • The correct identification of poly(A) signals (PAS) helps in elucidating the 3′-end boundaries of a gene and regulatory mechanisms and gives an insight into the multiple isoforms resulting from alternative PAS

  • An accurate tool for PAS prediction from genomic DNA sequences would be of great help for real applications, i.e., for finding computationally alternative PAS or as a component of gene finding tools

Read more

Summary

Introduction

Polyadenylation is a critical stage of RNA processing during the formation of mature mRNA, and is present in most of the known eukaryote protein-coding transcripts and many long non-coding RNAs. The correct identification of poly(A) signals (PAS) helps to elucidate the 3′-end genomic boundaries of a transcribed DNA region and gene regulatory mechanisms and gives insight into the multiple transcript isoforms resulting from alternative PAS. Contrary to the conserved PAS hexamers, downstream sequence elements and upstream sequence elements are highly variable in sequence composition and have not yet been adequately characterized [4] This sequence variability of the regions flanking PAS causes a major problem in computational prediction of such signals in genomic DNA sequences. An accurate predictive model of PAS would help in the identification of PAS for transcripts containing premature termination codons, which are degraded by cellular mechanisms [20]

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call