Abstract

It is challenging to identify the smallest microexons (≤15-nt) due to their small size. Consequently, these microexons are often misannotated or missed entirely during genome annotation. Here, we develop a pipeline to accurately identify 2,398 small microexons in 10 diverse plant species using 990 RNA-seq datasets, and most of them have not been annotated in the reference genomes. Analysis reveals that microexons tend to have increased detained flanking introns that require post-transcriptional splicing after polyadenylation. Examination of 45 conserved microexon clusters demonstrates that microexons and associated gene structures can be traced back to the origin of land plants. Based on these clusters, we develop an algorithm to genome-wide model coding microexons in 132 plants and find that microexons provide a strong phylogenetic signal for plant organismal relationships. Microexon modeling reveals diverse evolutionary trajectories, involving microexon gain and loss and alternative splicing. Our work provides a comprehensive view of microexons in plants.

Highlights

  • 2,3, Harkamal Walia[2,3], It is challenging to identify the smallest microexons (≤15-nt) due to their small size

  • We considered the smallest microexons (1–15 nt) that are most likely to be missed in genome annotations and transcriptome studies

  • We developed a pipeline that combined additional splicing junctions identified by OLego with existing genome-annotated junctions, which was used to guide STAR or HISAT2 mapping

Read more

Summary

Introduction

2,3, Harkamal Walia[2,3], It is challenging to identify the smallest microexons (≤15-nt) due to their small size. We develop a pipeline to accurately identify 2,398 small microexons in 10 diverse plant species using 990 RNA-seq datasets, and most of them have not been annotated in the reference genomes. RNA-seq analysis integrating multiple sequencing technologies identified a tissue-specific alternative splicing of a 45-nt microexon located within the AP2 domain of RAP2-7 protein, which fine-tunes DNA binding activity in cotton[17]. These works defined microexons by using 51-nt as the minimum length cutoff and did not pay special attention to the smallest microexons (1–15 nt), which were usually missed in genome annotations and transcriptome studies, leading to challenges to correctly predicting the function of corresponding proteins. Plants have fundamentally different tissue types compared with animals and there are unlikely to be homologs to the animal neurogenesis factor such as nSR100

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.