Abstract

BackgroundMany models have been proposed to detect copy number alterations in chromosomal copy number profiles, but it is usually not obvious to decide which is most effective for a given data set. Furthermore, most methods have a smoothing parameter that determines the number of breakpoints and must be chosen using various heuristics.ResultsWe present three contributions for copy number profile smoothing model selection. First, we propose to select the model and degree of smoothness that maximizes agreement with visual breakpoint region annotations. Second, we develop cross-validation procedures to estimate the error of the trained models. Third, we apply these methods to compare 17 smoothing models on a new database of 575 annotated neuroblastoma copy number profiles, which we make available as a public benchmark for testing new algorithms.ConclusionsWhereas previous studies have been qualitative or limited to simulated data, our annotation-guided approach is quantitative and suggests which algorithms are fastest and most accurate in practice on real data. In the neuroblastoma data, the equivalent pelt.n and cghseg.k methods were the best breakpoint detectors, and exhibited reasonable computation times.

Highlights

  • Many models have been proposed to detect copy number alterations in chromosomal copy number profiles, but it is usually not obvious to decide which is most effective for a given data set

  • Array comparative genomic hybridization microarrays have been developed as genome-wide assays for copy number alterations (CNAs), using the fact that microarray fluoresence intensity is proportional to DNA copy number [4]

  • If you want to use a particular penalty constant β instead of the annotation-guided approach we suggest in this article, the default Pruned Exact Linear Time (PELT) method offers a modest speedup over cghseg

Read more

Summary

Introduction

Many models have been proposed to detect copy number alterations in chromosomal copy number profiles, but it is usually not obvious to decide which is most effective for a given data set. The need for smoothing model selection criteria DNA copy number alterations (CNAs) can result from various types of genomic rearrangements, and are important in the study of many types of cancer [1]. Clinical outcome of patients with neuroblastoma has been shown to be worse for tumors with segmental alterations or breakpoints in specific genomic regions [2,3]. To construct an accurate predictive model of clinical outcome for these tumors, we must first accurately detect the precise location of each breakpoint. Each model has different assumptions about the data, and it is not obvious to decide which model is appropriate for a given data set

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call