Abstract

The development of cancer is largely driven by the gain or loss of subsets of the genome, promoting uncontrolled growth or disabling defenses against it. Denoising array-based Comparative Genome Hybridization (aCGH) data is an important computational problem central to understanding cancer evolution. In this article, we propose a new formulation of the denoising problem that we solve with a “vanilla” dynamic programming algorithm, which runs in O ( n 2 ) units of time. Then, we propose two approximation techniques. Our first algorithm reduces the problem into a well-studied geometric problem, namely halfspace emptiness queries, and provides an ϵ additive approximation to the optimal objective value in Õ( n 4/3;+Δ log (U/ϵ)) time, where Δ is an arbitrarily small positive constant and U = max{#8730;C,(| P i |) i =1,…, n } ( P =( P 1 , P 2 , …, P n ), P i ∈ ℝ, is the vector of the noisy aCGH measurements, C a normalization constant). The second algorithm provides a (1 ± ϵ) approximation (multiplicative error) and runs in O ( n log n /ϵ) time. The algorithm decomposes the initial problem into a small (logarithmic) number of Monge optimization subproblems that we can solve in linear time using existing techniques. Finally, we validate our model on synthetic and real cancer datasets. Our method consistently achieves superior precision and recall to leading competitors on the data with ground truth. In addition, it finds several novel markers not recorded in the benchmarks but supported in the oncology literature.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.