Copy number variation (CNV), a pivotal form of genomic structural variation, plays a critical role in the genetic diversity of cancer genomes. In numerous studies, the identification of CNVs is commonly approached as an issue of outlier detection. To address this, read depth (RD) signals for genomic segments are first extracted from next-generation sequencing (NGS) data. CNVs are detected by assigning outlier scores to genomic segments based on the distance between their RD signals and those of adjacent segments. However, the mean and covariance estimators of the global distribution commonly utilized for calculating distance are susceptible to the effect of CNVs, resulting in inaccurate CNV detection. To solve this problem, we introduce a new method, CNV_MCD, for detecting CNVs based on the minimum covariance determinant (MCD). CNV_MCD employs the MCD method to estimate the mean and covariance of the RD profile, circumventing the need for direct computation of these parameters and ensuring the minimization of the determinant of the covariance matrix. This approach enables the calculation of a robust distance for each genomic segment, which serves as an outlier score. Furthermore, we implement a fast median filtering to correct for baseline drift in the outlier scores and use a chi-squared approximation to determine the cutoff distance for CNVs. These enhancements facilitate the detection of small CNVs, establishing CNV_MCD as a complementary method for CNV detection in low-coverage sequencing data. Extensive experiments on both simulated and real datasets demonstrate that CNV_MCD outperforms other popular CNV detection methods. Overall, our method offers a more robust and reliable technique for CNV detection, playing a crucial role in elucidating the genetic mechanisms underlying complex diseases such as cancer.
Read full abstract