Abstract

Copy number variation (CNV) is a very important phenomenon in tumor genomes and plays a significant role in tumor genesis. Accurate detection of CNVs has become a routine and necessary procedure for a deep investigation of tumor cells and diagnosis of tumor patients. Next-generation sequencing (NGS) technique has provided a wealth of data for the detection of CNVs at base-pair resolution. However, such task is usually influenced by a number of factors, including GC-content bias, sequencing errors, and correlations among adjacent positions within CNVs. Although many existing methods have dealt with some of these artifacts by designing their own strategies, there is still a lack of comprehensive consideration of all the factors. In this paper, we propose a new method, MFCNV, for an accurate detection of CNVs from NGS data. Compared with existing methods, the characteristics of the proposed method include the following: (1) it makes a full consideration of the intrinsic correlations among adjacent positions in the genome to be analyzed, (2) it calculates read depth, GC-content bias, base quality, and correlation value for each genome bin and combines them as multiple features for the evaluation of genome bins, and (3) it addresses the joint effect among the factors via training a neural network algorithm for the prediction of CNVs. We test the performance of the MFCNV method by using simulation and real sequencing data and make comparisons with several peer methods. The results demonstrate that our method is superior to other methods in terms of sensitivity, precision, and F1-score and can detect many CNVs that other methods have not discovered. MFCNV is expected to be a complementary tool in the analysis of mutations in tumor genomes and can be extended to be applied to the analysis of single-cell sequencing data.

Highlights

  • Copy number variations (CNVs) are a type of structural variations accounting for the majority of genomic mutations in human genome

  • The four primary steps include: (1) definition of factors related to CNVs, where four types of factors are calculated and their values are normalized, (2) construction of a neural network based on the factors, where a back-propagation (BP) neural network algorithm is selected, (3) training of the neural network, where labeled CNVs could be sampled from both synthetic and real sequencing datasets, and (4) prediction of CNVs and declaration of gains or losses, where the CNV state for each genome bin is predicted based on the trained neural network algorithm as well as the type of CNVs

  • Accurate detection of CNVs is a crucial step for a comprehensive analysis of genomic mutations in the study of genome evaluation and human complex diseases

Read more

Summary

Introduction

Copy number variations (CNVs) are a type of structural variations accounting for the majority of genomic mutations in human genome. The recent development of next-generation sequencing (NGS) technique has provided us with an unprecedented opportunity to discover new CNVs. Compared with traditional chromosomal microarray technologies including array comparative genomic hybridization and single nucleotide polymorphism genotyping arrays, NGS has several distinguishable advantages: high-level resolution, high efficiency, and reduction of cost (Schuster, 2008; Ansorge, 2009, 2010). Compared with traditional chromosomal microarray technologies including array comparative genomic hybridization and single nucleotide polymorphism genotyping arrays, NGS has several distinguishable advantages: high-level resolution, high efficiency, and reduction of cost (Schuster, 2008; Ansorge, 2009, 2010) It is very attractive and promising for researchers to develop methods for the detection of CNVs and other types of genomic mutations by using NGS data

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call