Deep whole-genome sequencing of 3 cancer cell lines on 2 sequencing platforms

Kanika Arora,Molly Johnson,Jennifer Shelton,Vaidehi Jobanputra,Michael C Zody,Rashesh Sanghvi,Minita Shah,Soren Germer,Kshithija Nagulapalli,Jade Carter,Nicolas Robine,Dayna M Oschwald

doi:10.1038/s41598-019-55636-3

Abstract

To test the performance of a new sequencing platform, develop an updated somatic calling pipeline and establish a reference for future benchmarking experiments, we performed whole-genome sequencing of 3 common cancer cell lines (COLO-829, HCC-1143 and HCC-1187) along with their matched normal cell lines to great sequencing depths (up to 278x coverage) on both Illumina HiSeqX and NovaSeq sequencing instruments. Somatic calling was generally consistent between the two platforms despite minor differences at the read level. We designed and implemented a novel pipeline for the analysis of tumor-normal samples, using multiple variant callers. We show that coupled with a high-confidence filtering strategy, the use of combination of tools improves the accuracy of somatic variant calling. We also demonstrate the utility of the dataset by creating an artificial purity ladder to evaluate the somatic pipeline and benchmark methods for estimating purity and ploidy from tumor-normal pairs. The data and results of the pipeline are made accessible to the cancer genomics community.

Highlights

The field of cancer genomics has exploded with the development of high-throughput sequencing, largely driven by Illumina’s short read sequencing technology
In both Read 1 and Read 2, NovaSeq instruments produced more stretches of Gs than HiSeq X Ten (HiSeqX), which we attributed to an artifact resulting from the fact that G is detected as the absence of signal in the 2-color chemistry of the NovaSeq platform
While there were some differences between Single Nucleotide Variants (SNVs) and indel calls between the two pipelines, we found that the Copy-number variants (CNVs) recall was very similar between the two pipelines based on a gene-level comparison (99.8% recall for both our pipeline and the Sanger pipeline)

Summary

Introduction

The field of cancer genomics has exploded with the development of high-throughput sequencing, largely driven by Illumina’s short read sequencing technology. With the introduction of any new sequencing technology, it is important to investigate the error profiles and biases of the technology, and to understand the subsequent impact of those on downstream analyses. This is especially important for cancer data analysis where varying tumor purity and intra-tumor heterogeneity make distinguishing low frequency somatic variants from sequencing noise challenging. We have created a whole genome reference dataset of 3 matched tumor-normal cell lines sequenced deeply on both HiSeqX and NovaSeq, employed it to evaluate our somatic pipeline, and released it to the genomics community. We decided to share with the scientific community the data we generated and believe that it can be used as reference dataset, together with other similar dataset of real tumors[12,13] or cancer cell lines[8,14]

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Scientific Reports	Publication Date: Dec 1, 2019
Citations: 41	License type: open-access

R Discovery Prime

R Discovery Prime

Deep whole-genome sequencing of 3 cancer cell lines on 2 sequencing platforms

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Scientific Reports

Lead the way for us

Similar Papers

Abstract 4926: Advancements in somatic variant calling from UG100 whole genome and whole exome sequencing data
Doron Shem-Tov ... Ilya Soifer
Cancer Research | VOL. 84
Doron Shem-Tov, et. al.Doron Shem-Tov ... Ilya Soifer
22 Mar 2024
Abstract 4926: Advancements in somatic variant calling from UG100 whole genome and whole exome sequencing data
Doron Shem-Tov ... Ilya Soifer

Abstract 2474: Automated somatic variant classifier to reduce false positives identified by tumor normal variant callers
Alena S Harley ... Eve Shinbrot
Cancer Research | VOL. 79
Alena S Harley, et. al.Alena S Harley ... Eve Shinbrot
01 Jul 2019
Cancer Research | VOL. 79

No Evidence for Integrated Viral DNA in the Genome Sequence of Cutaneous Squamous Cell Carcinoma
Michelle T Dimon ... Sarah T Arron
Journal of Investigative Dermatology | VOL. 134
Michelle T Dimon, et. al.Michelle T Dimon ... Sarah T Arron
01 Jul 2014
Journal of Investigative Dermatology | VOL. 134

Abstract 852: Improved tumor-only somatic variant calling using a gradient boosted machine learning algorithm
Nicholas Phillips ... Patrick Jongeneel
Cancer Research | VOL. 80
Nicholas Phillips, et. al.Nicholas Phillips ... Patrick Jongeneel
13 Aug 2020
Cancer Research | VOL. 80

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Deep whole-genome sequencing of 3 cancer cell lines on 2 sequencing platforms

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Scientific Reports