Abstract

Copy number variations (CNVs) are significant causes of many human cancers and genetic diseases. The detection of CNVs has become a common method by which to analyze human diseases using next-generation sequencing (NGS) data. However, effective detection of insignificant CNVs is still a challenging task. In this study, we propose a new detection method, RKDOSCNV, to meet the need. RKDOSCNV uses kernel density estimation method to evaluate the local kernel density distribution of each read depth segment (RDS) based on an expanded nearest neighbor (k-nearest neighbors, reverse nearest neighbors, and shared nearest neighbors of each RDS) data set, and assigns a relative kernel density outlier score (RKDOS) for each RDS. According to the RKDOS profile, RKDOSCNV predicts the candidate CNVs by choosing a reasonable threshold, which it uses split read approach to correct the boundaries of candidate CNVs. The performance of RKDOSCNV is assessed by comparing it with several current popular methods via experiments with simulated and real data at different tumor purity levels. The experimental results verify that the performance of RKDOSCNV is superior to that of several other methods. In summary, RKDOSCNV is a simple and effective method for the detection of CNVs from whole genome sequencing (WGS) data, especially for samples with low tumor purity.

Highlights

  • With the rapid development of next-generation sequencing (NGS) technology, many sequencing data sets that are used to detect and characterize human genome variation have been produced (Medvedev et al, 2009)

  • The Copy number variations (CNVs) of three real samples from the 1000 Genomes Project were provided in the Database of Genomic Variants, which was used as the ground truth file to calculate the recall, precision, and F1-score of each method to evaluate their performances

  • A new method called RKDOSCNV was presented for CNV detection via the use of NGS data

Read more

Summary

Introduction

With the rapid development of next-generation sequencing (NGS) technology, many sequencing data sets that are used to detect and characterize human genome variation have been produced (Medvedev et al, 2009). Copy number variation (CNV) is one of the important forms of genome structural variation (Freeman et al, 2006). It has been reported that many human cancers and diseases are caused directly or indirectly by CNVs (Zhao et al, 2013). It is necessary for humans to accurately detect CNVs using NGS data to effectively discover disease-causing genes and develop targeted drugs (Yuan et al, 2018). According to the number of entered samples, the CNV detection methods can be divided into three categories, namely those that use multiple samples, matched case-control samples, and a single sample, respectively

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call