Germline contamination and leakage in whole genome somatic single nucleotide variant detection

Dorota H Sendorek,J Christopher Bare,Adam D Ewing,Takafumi N Yamaguchi,Adam A Margolin,Joshua M Stuart,Kathleen E Houlahan,Thea C Norman,Kyle Ellrott,Paul C Boutros,Cristian Caloian

doi:10.1186/s12859-018-2046-0

Abstract

BackgroundThe clinical sequencing of cancer genomes to personalize therapy is becoming routine across the world. However, concerns over patient re-identification from these data lead to questions about how tightly access should be controlled. It is not thought to be possible to re-identify patients from somatic variant data. However, somatic variant detection pipelines can mistakenly identify germline variants as somatic ones, a process called “germline leakage”. The rate of germline leakage across different somatic variant detection pipelines is not well-understood, and it is uncertain whether or not somatic variant calls should be considered re-identifiable. To fill this gap, we quantified germline leakage across 259 sets of whole-genome somatic single nucleotide variant (SNVs) predictions made by 21 teams as part of the ICGC-TCGA DREAM Somatic Mutation Calling Challenge.ResultsThe median somatic SNV prediction set contained 4325 somatic SNVs and leaked one germline polymorphism. The level of germline leakage was inversely correlated with somatic SNV prediction accuracy and positively correlated with the amount of infiltrating normal cells. The specific germline variants leaked differed by tumour and algorithm. To aid in quantitation and correction of leakage, we created a tool, called GermlineFilter, for use in public-facing somatic SNV databases.ConclusionsThe potential for patient re-identification from leaked germline variants in somatic SNV predictions has led to divergent open data access policies, based on different assessments of the risks. Indeed, a single, well-publicized re-identification event could reshape public perceptions of the values of genomic data sharing. We find that modern somatic SNV prediction pipelines have low germline-leakage rates, which can be further reduced, especially for cloud-sharing, using pre-filtering software.

Highlights

The clinical sequencing of cancer genomes to personalize therapy is becoming routine across the world
Gold standards of germline leakage We sought to evaluate the extent of germline contamination in contemporary cancer whole-genome sequencing (WGS) datasets, those comprising somatic Single nucleotide variant (SNV) predictions across the entire genome
The tumour binary alignment map (BAM) file was finalized by adding somatic mutations: both SNVs and structural variants

Summary

Introduction

The clinical sequencing of cancer genomes to personalize therapy is becoming routine across the world. The appropriate limits on data sharing remains a contentious issue throughout biomedical research, as shown by recent controversies [1] Studies such as the Personal Genome Project (PGP) have pioneered open sharing of Sendorek et al BMC Bioinformatics (2018) 19:28. The data collectors may find themselves under time constraints, unable to comprehensively exploit the data they produced without competition from subsequent researchers who are able to use the data freely. This effectively disincentivizes the challenging work of dataset creation, producing a situation akin to a tragedy of the commons. We focus here on this fourth challenge, or re-identifiability

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC bioinformatics	Publication Date: Jan 31, 2018
Citations: 8	License type: open-access

R Discovery Prime

R Discovery Prime

Germline contamination and leakage in whole genome somatic single nucleotide variant detection

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC bioinformatics

Lead the way for us

Similar Papers

B-249 Usefulness of Simultaneous Detection of Germline and Somatic Variants From Different Specimen Types by Next-generation Sequencing
M Han ... J Kim
Clinical Chemistry | VOL. 69
M Han, et. al.M Han ... J Kim
27 Sep 2023
Clinical Chemistry | VOL. 69

Computational Prediction of the Pathogenic Status of Cancer-Specific Somatic Variants.
Nikta Feizi ... Leigh Murphy
Frontiers in genetics | VOL. 12
Nikta Feizi, et. al.Nikta Feizi ... Leigh Murphy
18 Jan 2022
Frontiers in genetics | VOL. 12

Are the current guidelines for identification of myelodysplastic syndrome with germline predisposition strong enough?
Oriol Calvete ... Jaroslaw P Maciejewski
British Journal of Haematology | VOL. 201
Oriol Calvete, et. al.Oriol Calvete ... Jaroslaw P Maciejewski
30 Jan 2023
British Journal of Haematology | VOL. 201

Somatic single nucleotide variations and copy number variation can be used to distinguish high grade serous ovarian cancer from benign fallopian tubes with high accuracy (219)
Nicholas Cardillo ... Eric Devor
Gynecologic Oncology | VOL. 166
Nicholas Cardillo, et. al.Nicholas Cardillo ... Eric Devor
01 Aug 2022
Gynecologic Oncology | VOL. 166

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Germline contamination and leakage in whole genome somatic single nucleotide variant detection

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC bioinformatics