Abstract

Polyadenylation [poly(A)] of mRNA is a critical step during gene expression, which plays an important role in the termination of transcription. Prediction of poly(A) sites can help identify 3' ends of genes and improve genome annotation. Due to the limited knowledge of poly(A) signals in plants, predictive modeling of poly(A) sites in agricultural crops remains challenging. Recent studies have uncovered widespread occurrences of alternative poly(A) (APA) sites in intron and coding sequence (CDS), whereas the study on the prediction of these APA sites is scarce. In this study, four feature representation methods, involving a position weight matrix, the k-gram frequency, core hexamers, and a transition matrix, were adopted to characterize poly(A) signals surrounding APA sites. The classification model was built to predict each group of APA sites. Experimental results showed that this model was effective in the identification of APA sites located in different genomic regions, with a compromise between sensitivity and specificity higher than 87%. Compared with previous model PASS rice, accuracies for the prediction of APA sites in 3'-UTR, intron and CDS were enhanced by 5%, 7%, and 27%, respectively. This model will contribute to genetic engineering by enabling researchers to control poly(A) site selection.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.