Abstract
Endogenous pararetroviral sequences are the most commonly found virus sequences integrated into angiosperm genomes. We describe an endogenous pararetrovirus (EPRV) repeat in Fritillaria imperialis, a species that is under study as a result of its exceptionally large genome (1C = 42 096 Mbp, approximately 240 times bigger than Arabidopsis thaliana). The repeat (FriEPRV) was identified from Illumina reads using the RepeatExplorer pipeline, and exists in a complex genomic organization at the centromere of most, or all, chromosomes. The repeat was reconstructed into three consensus sequences that formed three interconnected loops, one of which carries sequence motifs expected of an EPRV (including the gag and pol domains). FriEPRV shows sequence similarity to members of the Caulimoviridae pararetrovirus family, with phylogenetic analysis indicating a close relationship to Petuvirus. It is possible that no complete EPRV sequence exists, although our data suggest an abundance that exceeds the genome size of Arabidopsis. Analysis of single nucleotide polymorphisms revealed elevated levels of C→T and G→A transitions, consistent with deamination of methylated cytosine. Bisulphite sequencing revealed high levels of methylation at CG and CHG motifs (up to 100%), and 15-20% methylation, on average, at CHH motifs. FriEPRV's centromeric location may suggest targeted insertion, perhaps associated with meiotic drive. We observed an abundance of 24 nt small RNAs that specifically target FriEPRV, potentially providing a signature of RNA-dependent DNA methylation. Such signatures of epigenetic regulation suggest that the huge genome of F. imperialis has not arisen as a consequence of a catastrophic breakdown in the regulation of repeat amplification.
Highlights
Viral sequences from two virus families, the Caulimoviridae and the Geminiviridae, are known to have integrated into the nuclear genomes of angiosperms
All contigs of FriEPRV are contained in Supporting Data file S1
The cluster graph of FriEPRV consists of three distinct loops, each of which comprises a set of overlapping or similar sequence reads
Summary
Viral sequences from two virus families, the Caulimoviridae (pararetroviruses) and the Geminiviridae, are known to have integrated into the nuclear genomes of angiosperms. These endogenous viral sequences are typically present in a few hundred or thousand copies, contributing to the repetitive genome content. They show varying levels of degradation and rearrangements, and, in most cases, are functionally defective. Repetitive DNA amplification processes and the efficiency of repeat removal are generally accepted to account for genome size differences in angiosperms of the same ploidy, for example in cotton (Gossypium hirsutum; Hawkins et al, 2009), Arabidopsis lyrata (Hu et al, 2011) and Nicotiana tabacum (Renny-Byfield et al, 2011).
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.