REPdenovo: Inferring De Novo Repeat Motifs from Short Sequence Reads.

Christophe Antoniewski,Chong Chu,Rasmus Nielsen,Yufeng Wu

doi:10.1371/journal.pone.0150719

Christophe Antoniewski, Chong Chu + Show 2 more

Open Access

https://doi.org/10.1371/journal.pone.0150719

Copy DOI

Abstract

Repeat elements are important components of eukaryotic genomes. One limitation in our understanding of repeat elements is that most analyses rely on reference genomes that are incomplete and often contain missing data in highly repetitive regions that are difficult to assemble. To overcome this problem we develop a new method, REPdenovo, which assembles repeat sequences directly from raw shotgun sequencing data. REPdenovo can construct various types of repeats that are highly repetitive and have low sequence divergence within copies. We show that REPdenovo is substantially better than existing methods both in terms of the number and the completeness of the repeat sequences that it recovers. The key advantage of REPdenovo is that it can reconstruct long repeats from sequence reads. We apply the method to human data and discover a number of potentially new repeats sequences that have been missed by previous repeat annotations. Many of these sequences are incorporated into various parasite genomes, possibly because the filtering process for host DNA involved in the sequencing of the parasite genomes failed to exclude the host derived repeat sequences. REPdenovo is a new powerful computational tool for annotating genomes and for addressing questions regarding the evolution of repeat families. The software tool, REPdenovo, is available for download at https://github.com/Reedwarbler/REPdenovo.

Highlights

IntroductionIn particular mammalian genomes, consist of large amounts of repeat elements
Most genomes, and in particular mammalian genomes, consist of large amounts of repeat elements
By comparing with repeat annotations stored in existing repeat libraries and latest long human sequence reads, we identify and validate a set of potentially novel repeats in the human genome that are not included in existing repeat annotations

Summary

Introduction

In particular mammalian genomes, consist of large amounts of repeat elements. Transposable elements (TEs) are perhaps the most well-known. They are believed to constitute 25% to 40% of most mammalian genomes [2,3,4,5] and can amplify themselves in the genome using various mechanisms, typically involving RNA intermediates. There are several existing computational approaches for finding TEs from short sequence reads [13, 14]. The method in [14] assumes a reference genome is available, and finds repeats from sequence reads using the reference. One can use short reads to assemble a reference genome. Repetitive regions are usually more difficult to assemble This leads to reduced power for repeat analysis if one uses the assembled reference genome for the purpose of repeat finding

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: PLOS ONE	Publication Date: Mar 15, 2016
Citations: 51	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

REPdenovo: Inferring De Novo Repeat Motifs from Short Sequence Reads.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLOS ONE

Lead the way for us

Similar Papers

Abstract 988: Genome-wide repeat landscapes in cancer and cell-free DNA
Akshaya Annapragada ... Vilmos Adleff
Cancer Research | VOL. 84
Akshaya Annapragada, et. al.Akshaya Annapragada ... Vilmos Adleff
22 Mar 2024
Cancer Research | VOL. 84

Repetitive Sequences in Sesame Genome
Hongmei Miao ... Yamin Sun
-
Hongmei Miao, et. al.Hongmei Miao ... Yamin Sun
01 Jan 2020
01 Jan 2020

Crystal Structure of a Mucus-binding Protein Repeat Reveals an Unexpected Functional Immunoglobulin Binding Activity
Donald A Mackenzie ... Nathalie Juge
Journal of Biological Chemistry | VOL. 284
Donald A Mackenzie, et. al.Donald A Mackenzie ... Nathalie Juge
01 Nov 2009
Journal of Biological Chemistry | VOL. 284

Interaction between a geminivirus replication protein and origin DNA is essential for viral replication.
E.P Fontes ... L Hanley-Bowdoin
Journal of Biological Chemistry | VOL. 269
E.P Fontes, et. al.E.P Fontes ... L Hanley-Bowdoin
01 Mar 1994
Journal of Biological Chemistry | VOL. 269

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

REPdenovo: Inferring De Novo Repeat Motifs from Short Sequence Reads.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLOS ONE