Abstract

Endogenous pararetroviral sequences are the most commonly found virus sequences integrated into angiosperm genomes. We describe an endogenous pararetrovirus (EPRV) repeat in Fritillaria imperialis, a species that is under study as a result of its exceptionally large genome (1C = 42 096 Mbp, approximately 240 times bigger than Arabidopsis thaliana). The repeat (FriEPRV) was identified from Illumina reads using the RepeatExplorer pipeline, and exists in a complex genomic organization at the centromere of most, or all, chromosomes. The repeat was reconstructed into three consensus sequences that formed three interconnected loops, one of which carries sequence motifs expected of an EPRV (including the gag and pol domains). FriEPRV shows sequence similarity to members of the Caulimoviridae pararetrovirus family, with phylogenetic analysis indicating a close relationship to Petuvirus. It is possible that no complete EPRV sequence exists, although our data suggest an abundance that exceeds the genome size of Arabidopsis. Analysis of single nucleotide polymorphisms revealed elevated levels of C→T and G→A transitions, consistent with deamination of methylated cytosine. Bisulphite sequencing revealed high levels of methylation at CG and CHG motifs (up to 100%), and 15-20% methylation, on average, at CHH motifs. FriEPRV's centromeric location may suggest targeted insertion, perhaps associated with meiotic drive. We observed an abundance of 24 nt small RNAs that specifically target FriEPRV, potentially providing a signature of RNA-dependent DNA methylation. Such signatures of epigenetic regulation suggest that the huge genome of F. imperialis has not arisen as a consequence of a catastrophic breakdown in the regulation of repeat amplification.

Highlights

  • Viral sequences from two virus families, the Caulimoviridae and the Geminiviridae, are known to have integrated into the nuclear genomes of angiosperms

  • All contigs of FriEPRV are contained in Supporting Data file S1

  • The cluster graph of FriEPRV consists of three distinct loops, each of which comprises a set of overlapping or similar sequence reads

Read more

Summary

Introduction

Viral sequences from two virus families, the Caulimoviridae (pararetroviruses) and the Geminiviridae, are known to have integrated into the nuclear genomes of angiosperms. These endogenous viral sequences are typically present in a few hundred or thousand copies, contributing to the repetitive genome content. They show varying levels of degradation and rearrangements, and, in most cases, are functionally defective. Repetitive DNA amplification processes and the efficiency of repeat removal are generally accepted to account for genome size differences in angiosperms of the same ploidy, for example in cotton (Gossypium hirsutum; Hawkins et al, 2009), Arabidopsis lyrata (Hu et al, 2011) and Nicotiana tabacum (Renny-Byfield et al, 2011).

Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call