Abstract

Repetitive sequences are abundant in the human genome. Different classes of repetitive DNA sequences, including simple repeats, tandem repeats, segmental duplications, interspersed repeats, and other elements, collectively span more than 50% of the genome. Because repeat sequences occur in the genome at different scales they can cause various types of sequence analysis errors, including in alignment, de novo assembly, and annotation, among others. This mini-review highlights the challenges introduced by small-scale repeat sequences, especially near-identical tandem or closely located repeats and short tandem repeats, for discovering DNA insertion and deletion (indel) mutations from next-generation sequencing data. We also discuss the de Bruijn graph sequence assembly paradigm that is emerging as the most popular and promising approach for detecting indels. The human exome is taken as an example and highlights how these repetitive elements can obscure or introduce errors while detecting these types of mutations.

Highlights

  • Enormous advances made over the last decade in next-generation sequencing technologies and computational variation analysis have made it feasible to study human genetics in unprecedented detail

  • While historically genomic studies have focused on single nucleotide polymorphisms (SNPs) due to their prevalence and relative technical simplicity, a recent trend has been to study the role of insertion and deletion mutations

  • We show examples of the type of errors introduced by these repetitive structures and we provide recommendation on how to reduce or avoid the errors

Read more

Summary

BIOENGINEERING AND BIOTECHNOLOGY

Reviewed by: Francesco Vezzi, SciLifeLab, Sweden Lisle Elliott Mose, University of North Carolina at Chapel Hill, USA Pierre Peterlongo, Inria, France. Different classes of repetitive DNA sequences, including simple repeats, tandem repeats, segmental duplications, interspersed repeats, and other elements, collectively span more than 50% of the genome. Because repeat sequences occur in the genome at different scales they can cause various types of sequence analysis errors, including in alignment, de novo assembly, and annotation, among others. This mini-review highlights the challenges introduced by smallscale repeat sequences, especially near-identical tandem or closely located repeats and short tandem repeats, for discovering DNA insertion and deletion (indel) mutations from next-generation sequencing data. The human exome is taken as an example and highlights how these repetitive elements can obscure or introduce errors while detecting these types of mutations

INTRODUCTION
Narzisi and Schatz
Findings
CONCLUSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.