Abstract

Transcript-based annotations of genes facilitate both genome-wide analyses and detailed single-locus research. In contrast, transposable element (TE) annotations are rudimentary, consisting of information only on TE location and type. The repetitiveness and limited annotation of TEs prevent the ability to distinguish between potentially functional expressed elements and degraded copies. To improve genome-wide TE bioinformatics, we performed long-read sequencing of cDNAs from Arabidopsis (Arabidopsis thaliana) lines deficient in multiple layers of TE repression. These uniquely mapping transcripts were used to identify the set of TEs able to generate polyadenylated RNAs and create a new transcript-based annotation of TEs that we have layered upon the existing high-quality community standard annotation. We used this annotation to reduce the bioinformatic complexity associated with multimapping reads from short-read RNA sequencing experiments, and we show that this improvement is expanded in a TE-rich genome such as maize (Zea mays). Our TE annotation also enables the testing of specific standing hypotheses in the TE field. We demonstrate that inaccurate TE splicing does not trigger small RNA production, and the cell more strongly targets DNA methylation to TEs that have the potential to make mRNAs. This work provides a transcript-based TE annotation for Arabidopsis and maize, which serves as a blueprint to reduce the bioinformatic complexity associated with repetitive TEs in any organism.

Highlights

  • A consistent problem with the analysis of eukaryotic genomes is the complexity introduced by transposable elements (TEs)

  • We began by isolating total RNA, purifying polyadenylated mRNAs, and performing Oxford Nanopore Technology (ONT) sequencing of full-length cDNAs from five Arabidopsis genotypes (Figure 1A)(see Methods)(sequencing statistics in Supplemental Table 1)

  • We have demonstrated two biological consequences of this improved annotation

Read more

Summary

Introduction

A consistent problem with the analysis of eukaryotic genomes is the complexity introduced by transposable elements (TEs). Thousands to millions of TEs are present in eukaryotic genomes, often nested in convoluted organizations. Analysis of these regions is cumbersome due to their repetitive nature (number of similar or even identical elements) and the fact that current TE annotations only describe the bare minimum of TE information. Gene annotations are regularly based on transcript information, which is missing for TEs. In contrast, gene annotations are regularly based on transcript information, which is missing for TEs These gene annotations describe the transcriptional start sites (TSSs), polyadenylation site, direction and splicing pattern, which provide genes with higher resolution in bioinformatic experiments compared to TEs. The lack of transcript information for TEs hampers downstream bioinformatics, leading many researchers to ignore these regions of the genome altogether

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.