Abstract

e13659 Background: Efficient and accurate identification of somatic variant is important for understanding the formation, progression, and treatment of cancer. It is necessary to conduct manual review by Integrative Genomic Viewer (IGV) in traditional variant calling process. However, the traditional manual is heavy workload when evaluating tumor with a high variant burden. In this study, a new convolutional neural network (CNN) method was created to train models for somatic mutation identification, which was suitable for Panel sequencing platform with different tumor purities. Methods: A total of 1000 tumor samples from next generation sequencing (NGS)-based genetic testing by a College of American Pathologists (CAP) accredited and Clinical Laboratory Improvement Amendments (CLIA) certified laboratory. Through variant calling program, like GATK, the candidate mutation locations were identified and standardized by manual confirmation. For each candidate mutation location, reads of both tumor and control tissue were extracted. A 2-dimensional feature matrix M of size (2k+1) * 32 in each candidate base was created. The rows of 2k+1 represented the length of candidate region, and the 32 columns included the reads coverage frequency, mapping quality messages, and genome local scores of different tumor and control tissues. CNN model, which includes nine convolutional layers structured by Temporal Convolutional Networks (TCN) but with a different structure to adapt to the proposed input matrix, was used for training. The training data set including manually validated sequence data was used as benchmark test, and optimized by Stochastic Gradient Descent (SGD) optimizer with a learning rate of 0.01 was used for training. Results: The validation data set included 15 mixed samples which were composed of different proportions of known cell lines and real mixed blood samples. The pooled DNA contained 2,359 somatic variants, with expected variant allele frequencies ranged from 3% to 97% in each pool. The overall sensitivity and positive predictive value (PPV) of single nucleotide variants (SNVs) were 99.3% and 99.8%, respectively. Conclusions: A novel and sensitive computational tool for somatic variation detection in DNA Panel sequencing was developed. Our result showed that the deep learning CNN model could call variant in Panel sequencing data.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call