Abstract

BackgroundThe program RepeatMasker and the database Repbase-ISB are part of the most widely used strategy for annotating repeats in animal genomes. They have been used to show that avian genomes have a lower repeat content (8–12 %) than the sequenced genomes of many vertebrate species (30–55 %). However, the efficiency of such a library-based strategies is dependent on the quality and completeness of the sequences in the database that is used. An alternative to these library based methods are methods that identify repeats de novo. These alternative methods have existed for a least a decade and may be more powerful than the library based methods. We have used an annotation strategy involving several complementary de novo tools to determine the repeat content of the model genome galGal4 (1.04 Gbp), including identifying simple sequence repeats (SSRs), tandem repeats and transposable elements (TEs).ResultsWe annotated over one Gbp. of the galGal4 genome and showed that it is composed of approximately 19 % SSRs and TEs repeats. Furthermore, we estimate that the actual genome of the red jungle fowl contains about 31–35 % repeats. We find that library-based methods tend to overestimate TE diversity. These results have a major impact on the current understanding of repeats distributions throughout chromosomes in the red jungle fowl.ConclusionsOur results are a proof of concept of the reliability of using de novo tools to annotate repeats in large animal genomes. They have also revealed issues that will need to be resolved in order to develop gold-standard methodologies for annotating repeats in eukaryote genomes.Electronic supplementary materialThe online version of this article (doi:10.1186/s12864-016-3015-5) contains supplementary material, which is available to authorized users.

Highlights

  • The program RepeatMasker and the database Repbase-Institute for System Biology (ISB) are part of the most widely used strategy for annotating repeats in animal genomes

  • DNA reassociation kinetics can be used to estimate a conservative proportion of repeats

  • The 22–24 % repeat proportion estimated for the red jungle fowl (RJF) is only a minimal value because its calculation is limited by two parameters in the experimental procedure [62]

Read more

Summary

Introduction

The program RepeatMasker and the database Repbase-ISB are part of the most widely used strategy for annotating repeats in animal genomes. They have been used to show that avian genomes have a lower repeat content (8–12 %) than the sequenced genomes of many vertebrate species (30–55 %). The efficiency of such a library-based strategies is dependent on the quality and completeness of the sequences in the database that is used An alternative to these library based methods are methods that identify repeats de novo. Repeated sequences are the most abundant components of many eukaryote genomes. Repeated sequences in eukaryotic genomes vary in their structure, organization and location in chromosomes. The sequence of the repeated units and the abundance of each family are generally specific to each species [6, 7]

Objectives
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call