Bayesian localization of CNV candidates in WGS data within minutes

John Wiedenhoeft,Rimma Gulevich,Alexander Schliep,Alex Cagan,Rimma Kozhemyakina

doi:10.1186/s13015-019-0154-7

Abstract

BackgroundFull Bayesian inference for detecting copy number variants (CNV) from whole-genome sequencing (WGS) data is still largely infeasible due to computational demands. A recently introduced approach to perform Forward–Backward Gibbs sampling using dynamic Haar wavelet compression has alleviated issues of convergence and, to some extent, speed. Yet, the problem remains challenging in practice.ResultsIn this paper, we propose an improved algorithmic framework for this approach. We provide new space-efficient data structures to query sufficient statistics in logarithmic time, based on a linear-time, in-place transform of the data, which also improves on the compression ratio. We also propose a new approach to efficiently store and update marginal state counts obtained from the Gibbs sampler.ConclusionsUsing this approach, we discover several CNV candidates in two rat populations divergently selected for tame and aggressive behavior, consistent with earlier results concerning the domestication syndrome as well as experimental observations. Computationally, we observe a 29.5-fold decrease in memory, an average 5.8-fold speedup, as well as a 191-fold decrease in minor page faults. We also observe that metrics varied greatly in the old implementation, but not the new one. We conjecture that this is due to the better compression scheme. The fully Bayesian segmentation of the entire WGS data set required 3.5 min and 1.24 GB of memory, and can hence be performed on a commodity laptop.

Highlights

Full Bayesian inference for detecting copy number variants (CNV) from whole-genome sequencing (WGS) data is still largely infeasible due to computational demands
Though the advantages of Bayesian segmentation over frequentist approaches have previously been noted [6,7,8,9,10], inference is computationally demanding on WGS-scale data; in particular, Bayesian methods which rely on Markov Chain Monte Carlo (MCMC) approximations are infeasible on standard computers, in terms of memory requirements, speed and convergence characteristics
We present a case study of CNV inference on differential WGS read depth data using HaMMLET with the Haar breakpoint array

Summary

Results

We propose an improved algorithmic framework for this approach. We provide new space-efficient data structures to query sufficient statistics in logarithmic time, based on a linear-time, in-place transform of the data, which improves on the compression ratio. We propose a new approach to efficiently store and update marginal state counts obtained from the Gibbs sampler

Conclusions

Background

Results and discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Algorithms for Molecular Biology	Publication Date: Sep 23, 2019
Citations: 1	License type: open-access

R Discovery Prime

R Discovery Prime

Bayesian localization of CNV candidates in WGS data within minutes

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Algorithms for Molecular Biology

Lead the way for us

Similar Papers

Genomic and transcriptomic somatic alterations of hepatocellular carcinoma in non-cirrhotic livers
Zachary L Skidmore ... Obi L Griffith
Cancer Genetics | VOL. 264-265
Zachary L Skidmore, et. al.Zachary L Skidmore ... Obi L Griffith
30 Apr 2022
Cancer Genetics | VOL. 264-265

Evaluation of tools for identifying large copy number variations from ultra-low-coverage whole-genome sequencing data
Johannes Smolander ... Laura L Elo
BMC genomics | VOL. 22
Johannes Smolander, et. al.Johannes Smolander ... Laura L Elo
17 May 2021
BMC genomics | VOL. 22

Can Whole Genome and Whole Transcriptome Sequencing Replace Standard Procedures in CLL Diagnostics?
Heiko Mueller ... Claudia Haferlach
Blood | VOL. 142
Heiko Mueller, et. al.Heiko Mueller ... Claudia Haferlach
02 Nov 2023
Blood | VOL. 142

Variability in Pattern of Mutational Signatures in Multiple Myeloma As a Function of Racial Origin
Patrick Blaney ...
Blood | VOL. 140
Patrick Blaney, et. al.Patrick Blaney ...
15 Nov 2022
Blood | VOL. 140

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Bayesian localization of CNV candidates in WGS data within minutes

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Algorithms for Molecular Biology