The challenge of small-scale repeats for indel discovery.

Giuseppe Narzisi,Michael C Schatz

doi:10.3389/fbioe.2015.00008

Giuseppe Narzisi, Michael C Schatz

Open Access

https://doi.org/10.3389/fbioe.2015.00008

Copy DOI

Abstract

Repetitive sequences are abundant in the human genome. Different classes of repetitive DNA sequences, including simple repeats, tandem repeats, segmental duplications, interspersed repeats, and other elements, collectively span more than 50% of the genome. Because repeat sequences occur in the genome at different scales they can cause various types of sequence analysis errors, including in alignment, de novo assembly, and annotation, among others. This mini-review highlights the challenges introduced by small-scale repeat sequences, especially near-identical tandem or closely located repeats and short tandem repeats, for discovering DNA insertion and deletion (indel) mutations from next-generation sequencing data. We also discuss the de Bruijn graph sequence assembly paradigm that is emerging as the most popular and promising approach for detecting indels. The human exome is taken as an example and highlights how these repetitive elements can obscure or introduce errors while detecting these types of mutations.

Highlights

Enormous advances made over the last decade in next-generation sequencing technologies and computational variation analysis have made it feasible to study human genetics in unprecedented detail
While historically genomic studies have focused on single nucleotide polymorphisms (SNPs) due to their prevalence and relative technical simplicity, a recent trend has been to study the role of insertion and deletion mutations
We show examples of the type of errors introduced by these repetitive structures and we provide recommendation on how to reduce or avoid the errors

Summary

BIOENGINEERING AND BIOTECHNOLOGY

Reviewed by: Francesco Vezzi, SciLifeLab, Sweden Lisle Elliott Mose, University of North Carolina at Chapel Hill, USA Pierre Peterlongo, Inria, France. Different classes of repetitive DNA sequences, including simple repeats, tandem repeats, segmental duplications, interspersed repeats, and other elements, collectively span more than 50% of the genome. Because repeat sequences occur in the genome at different scales they can cause various types of sequence analysis errors, including in alignment, de novo assembly, and annotation, among others. This mini-review highlights the challenges introduced by smallscale repeat sequences, especially near-identical tandem or closely located repeats and short tandem repeats, for discovering DNA insertion and deletion (indel) mutations from next-generation sequencing data. The human exome is taken as an example and highlights how these repetitive elements can obscure or introduce errors while detecting these types of mutations

INTRODUCTION

Narzisi and Schatz

Findings

CONCLUSION

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Frontiers in Bioengineering and Biotechnology	Publication Date: Jan 26, 2015
Citations: 53	License type: cc-by

R Discovery Prime

R Discovery Prime

The challenge of small-scale repeats for indel discovery.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Frontiers in Bioengineering and Biotechnology

Lead the way for us

Similar Papers

Penetrance and low concordance in monozygotic twins in disease: are they the results of alterations in somatic genomes?
William Kastern ... Ilona Kryspin-Sorensen
Molecular reproduction and development | VOL. 1
William Kastern, et. al.William Kastern ... Ilona Kryspin-Sorensen
01 Jan 1987
Molecular reproduction and development | VOL. 1

Retrotransposition and Structural Variation in the Human Genome
James R Lupski
Cell | VOL. 141
James R LupskiJames R Lupski
01 Jun 2010
Cell | VOL. 141

A study of the evolution of repeated DNA sequences in primates and the existence of a new class of repetitive sequences in primates
Prescott L Deininger ... Carl W Schmid
Journal of Molecular Biology | VOL. 127
Prescott L Deininger, et. al.Prescott L Deininger ... Carl W Schmid
01 Feb 1979
Journal of Molecular Biology | VOL. 127

Editorial: Repetitive Structures in Biological Sequences: Algorithms and Applications.
Marco Pellegrini ... Costas S Iliopoulos
Frontiers in bioengineering and biotechnology | VOL. 4
Marco Pellegrini, et. al.Marco Pellegrini ... Costas S Iliopoulos
04 Aug 2016
Frontiers in bioengineering and biotechnology | VOL. 4

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

The challenge of small-scale repeats for indel discovery.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Frontiers in Bioengineering and Biotechnology