High-coverage, long-read sequencing of Han Chinese trio reference samples

Ying-Chih Wang,Justin M Zook,Jonathan Trow,Hardik Shah,Aaron M Wenger,Gintaras Deikus,Robert Sebra,Nathan D Olson,Melissa Smith,Stephen Sherry,Marc L Salit,Chunlin Xiao

doi:10.1038/s41597-019-0098-2

Abstract

Single-molecule long-read sequencing datasets were generated for a son-father-mother trio of Han Chinese descent that is part of the Genome in a Bottle (GIAB) consortium portfolio. The dataset was generated using the Pacific Biosciences Sequel System. The son and each parent were sequenced to an average coverage of 60 and 30, respectively, with N50 subread lengths between 16 and 18 kb. Raw reads and reads aligned to both the GRCh37 and GRCh38 are available at the NCBI GIAB ftp site (ftp://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/data/ChineseTrio/). The GRCh38 aligned read data are archived in NCBI SRA (SRX4739017, SRX4739121, and SRX4739122). This dataset is available for anyone to develop and evaluate long-read bioinformatics methods.

Highlights

Background & SummaryGenome In a Bottle (GIAB) is a consortium hosted by the National Institute of Standards and Technology (NIST), primarily dedicated to the development and characterization of human genomic reference materials
Expanding the benchmark to more challenging variants and regions using long-read sequencing technologies is of interest to the consortium and its stakeholders, including technology and bioinformatics developers, clinical laboratories, and regulatory agencies[7]
In an effort to expand the benchmark to more challenging variants and regions, a high-coverage long-read sequence dataset was generated for the Han Chinese Trio using the PacBio Sequel System (Pacific Biosciences, Menlo Park CA, USA)

Summary

Data Descriptor sequencing of Han Chinese trio reference samples

Ying-Chih Wang[1], Nathan D. Single-molecule long-read sequencing datasets were generated for a son-father-mother trio of Han Chinese descent that is part of the Genome in a Bottle (GIAB) consortium portfolio. The dataset was generated using the Pacific Biosciences Sequel System. The son and each parent were sequenced to an average coverage of 60 and 30, respectively, with N50 subread lengths between 16 and 18 kb. Raw reads and reads aligned to both the GRCh37 and GRCh38 are available at the NCBI GIAB ftp site (ftp:// ftp-trace.ncbi.nlm.nih.gov/giab/ftp/data/ChineseTrio/). The GRCh38 aligned read data are archived in NCBI SRA (SRX4739017, SRX4739121, and SRX4739122). This dataset is available for anyone to develop and evaluate long-read bioinformatics methods

Background & Summary

Methods

Polymerase Reads Subreads Mapped Reads

Data Records

Technical Validation

NIST Sample ID

Usage Notes

Author Contributions

Findings

Additional Information

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Scientific Data	Publication Date: Jun 14, 2019
Citations: 12	License type: open-access

R Discovery Prime

R Discovery Prime

High-coverage, long-read sequencing of Han Chinese trio reference samples

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Scientific Data

Lead the way for us

Similar Papers

A Comparison of Variant Calling Pipelines Using Genome in a Bottle as a Reference.
Adam Cornish ... Chittibabu Guda
BioMed Research International | VOL. 2015
Adam Cornish, et. al.Adam Cornish ... Chittibabu Guda
01 Jan 2015
BioMed Research International | VOL. 2015

Abstract 3551: Continued analysis of extensive data towards Genome in a Bottle benchmarks for a new tumor normal pair
Justin Wagner ... Andrew Liss
Cancer Research | VOL. 84
Justin Wagner, et. al.Justin Wagner ... Andrew Liss
22 Mar 2024
Abstract 3551: Continued analysis of extensive data towards Genome in a Bottle benchmarks for a new tumor normal pair
Justin Wagner ... Andrew Liss

Abstract 876: Sequencing a new broadly-consented tumor/normal cell line for a Genome in a Bottle Benchmark
Gail Rosen ... Justin Zook
Cancer Research | VOL. 83
Gail Rosen, et. al.Gail Rosen ... Justin Zook
04 Apr 2023
Cancer Research | VOL. 83

Benchmarking workflows to assess performance and suitability of germline variant calling pipelines in clinical diagnostic assays
Vandhana Krishnan ... Michael P Snyder
BMC Bioinformatics | VOL. 22
Vandhana Krishnan, et. al.Vandhana Krishnan ... Michael P Snyder
24 Feb 2021
BMC Bioinformatics | VOL. 22

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

High-coverage, long-read sequencing of Han Chinese trio reference samples

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Scientific Data