Abstract

Motivation: Repetitive sequences account for approximately half of the human genome. Accurately ascertaining sequences in these regions with next generation sequencers is challenging, and requires a different set of analytical techniques than for reads originating from unique sequences. Complicating the matter are repetitive regions subject to programmed rearrangements, as is the case with the antigen-binding domains in the Immunoglobulin (Ig) and T-cell receptor (TCR) loci.Results: We developed a probability-based score and visualization method to aid in distinguishing true structural variants from alignment artifacts. We demonstrate the usefulness of this method in its ability to separate real structural variants from false positives generated with existing upstream analysis tools. We validated our approach using both target-capture and whole-genome experiments. Capture sequencing reads were generated from primary lymphoid tumors, cancer cell lines and an EBV-transformed lymphoblast cell line over the Ig and TCR loci. Whole-genome sequencing reads were from a lymphoblastoid cell-line.Availability: We implement our method as an R package available at https://github.com/Eitan177/targetSeqView. Code to reproduce the figures and results are also available.Contact: ehalper2@jhmi.eduSupplementary information: Supplementary data are available at Bioinformatics online.

Highlights

  • Structural variants (SVs), including deletions, insertions, inversions and translocations, are known to contribute to a wide range of human phenotypes (Schinzel, 1988)

  • Repetitive DNA, stretches of nucleotides present in more than one copy in the haploid genome, accounts for about half of the human genome. These stretches may be sub-classified by length, copy number, base composition and linear organization, all of which are difficult to assay with Next generation sequencing (NGS) (Treangen and Salzberg, 2012)

  • Our sequencing library for the target–capture experiment included fragments selectively captured from V(D)J loci on chromosomes 2, 7, 14 and 22 in a panel of neoplastic B and T lymphocytes and an EBV transformed cell line

Read more

Summary

Introduction

Structural variants (SVs), including deletions, insertions, inversions and translocations, are known to contribute to a wide range of human phenotypes (Schinzel, 1988). Despite group efforts and recent advances, discovering and annotating the full landscape of SVs in humans is incomplete. This is in part owed to the inaccuracy of NGS in defining repetitive DNA. Repetitive DNA, stretches of nucleotides present in more than one copy in the haploid genome, accounts for about half of the human genome. These stretches may be sub-classified by length, copy number, base composition and linear organization, all of which are difficult to assay with NGS (Treangen and Salzberg, 2012)

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call