Abstract

BackgroundCharacterizing large genomic variants is essential to expanding the research and clinical applications of genome sequencing. While multiple data types and methods are available to detect these structural variants (SVs), they remain less characterized than smaller variants because of SV diversity, complexity, and size. These challenges are exacerbated by the experimental and computational demands of SV analysis. Here, we characterize the SV content of a personal genome with Parliament, a publicly available consensus SV-calling infrastructure that merges multiple data types and SV detection methods.ResultsWe demonstrate Parliament’s efficacy via integrated analyses of data from whole-genome array comparative genomic hybridization, short-read next-generation sequencing, long-read (Pacific BioSciences RSII), long-insert (Illumina Nextera), and whole-genome architecture (BioNano Irys) data from the personal genome of a single subject (HS1011). From this genome, Parliament identified 31,007 genomic loci between 100 bp and 1 Mbp that are inconsistent with the hg19 reference assembly. Of these loci, 9,777 are supported as putative SVs by hybrid local assembly, long-read PacBio data, or multi-source heuristics. These SVs span 59 Mbp of the reference genome (1.8%) and include 3,801 events identified only with long-read data. The HS1011 data and complete Parliament infrastructure, including a BAM-to-SV workflow, are available on the cloud-based service DNAnexus.ConclusionsHS1011 SV analysis reveals the limits and advantages of multiple sequencing technologies, specifically the impact of long-read SV discovery. With the full Parliament infrastructure, the HS1011 data constitute a public resource for novel SV discovery, software calibration, and personal genome structural variation analysis.Electronic supplementary materialThe online version of this article (doi:10.1186/s12864-015-1479-3) contains supplementary material, which is available to authorized users.

Highlights

  • Characterizing large genomic variants is essential to expanding the research and clinical applications of genome sequencing

  • HS1011 structural variants (SVs) To provide a robust characterization of structural variation in a human personal genome, we examined multiple data sources from a single individual (HS1011)

  • This individual has been previously analyzed with Array comparative genomic hybridization (aCGH) data and by whole-genome and whole-exome sequencing, revealing novel SH3TC2 single-nucleotide variant (SNV) causative for the subject’s autosomal recessive Charcot-Marie-Tooth (CMT) neuropathy [25,26]

Read more

Summary

Introduction

Characterizing large genomic variants is essential to expanding the research and clinical applications of genome sequencing. While multiple data types and methods are available to detect these structural variants (SVs), they remain less characterized than smaller variants because of SV diversity, complexity, and size. These challenges are exacerbated by the experimental and computational demands of SV analysis. Analysis of personal genome sequence variation data is currently dominated by single-nucleotide variant (SNV) and small insertion and deletion (indel) detection. Account for a greater number of variable bases in the genome, with up to 13% of the human genome subject to such variation [4] These structural variants (SVs) include copy-number variants (CNVs), copy-number neutral inversions, mobile element insertions (MEIs), deletions, translocations, and complex combinations of these events. As whole-genome sequencing (WGS) becomes increasingly employed as a research and molecular diagnostic tool, complete and accurate characterization of human genomic variation, including SVs, will be essential to informing clinical decision making

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call