Abstract
Copy number variation (CNV) is a prevalent type of genetic structural variation and is the origin of numerous hereditary diseases. Thorough identification and classification of CNVs are fundamental to provide a whole perspective of human genome and to discover diseased genes. Next generation sequencing (NGS) has provided an abundance of data which has accelerated the revolution of algorithm design to identify CNVs at base-pair resolution. Nonetheless, certain functions are often influenced by several factors which include sequencing artifacts, GC bias, and interrelations among neighboring positions within CNVs. Though a number of peer strategies have coped with a few of the aforementioned artifacts by modeling their approaches, precise identification of CNVs of low amplitudes remains a difficult task. In this paper, we propose an alternative computational method CNV-KOF, to accurately detect CNVs of whole-range amplitudes based on NGS data. The approach adopts an adaptive kernel density estimation (KDE)-based strategy and assigns a KDE-based outlier factor (KOF) to each genomic segment. Along with the outlier factor profile, CNV-KOF adopts a box plot strategy to detect CNVs without depending on distribution assumptions. We have tested CNV-KOF on simulated and real datasets compared to several peer methods. Simulation and real sequencing data experiments demonstrate that the proposed method outperforms the peer methods in respect to F1-score, sensitivity, and precision. Thus, CNV-KOF is expected to become a complementary tool for detecting CNVs even in scenarios of low-level coverage and tumor purity.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.