Abstract

BackgroundCalling genetic variations from sequence reads is an important problem in genomics. There are many existing methods for calling various types of variations. Recently, Google developed a method for calling single nucleotide polymorphisms (SNPs) based on deep learning. Their method visualizes sequence reads in the forms of images. These images are then used to train a deep neural network model, which is used to call SNPs. This raises a research question: can deep learning be used to call more complex genetic variations such as structural variations (SVs) from sequence data?ResultsIn this paper, we extend this high-level approach to the problem of calling structural variations. We present DeepSV, an approach based on deep learning for calling long deletions from sequence reads. DeepSV is based on a novel method of visualizing sequence reads. The visualization is designed to capture multiple sources of information in the sequence data that are relevant to long deletions. DeepSV also implements techniques for working with noisy training data. DeepSV trains a model from the visualized sequence reads and calls deletions based on this model. We demonstrate that DeepSV outperforms existing methods in terms of accuracy and efficiency of deletion calling on the data from the 1000 Genomes Project.ConclusionsOur work shows that deep learning can potentially lead to effective calling of different types of genetic variations that are complex than SNPs.

Highlights

  • High-throughput DNA sequencing technologies have generated vast amount of sequence data

  • The DeepVariant approach raises a natural research question: can deep learning be applied to call other types of genetic variations from sequence data that are more complex than single nucleotide polymorphisms (SNPs) and short indels? In this paper, we provide a positive answer for this question: we show that deep learning can be used for accurately calling structural variations from sequence data

  • Our work extends the findings of DeepVariant by showing that deep learning can be useful for calling structural variations that are more complex than SNPs and short indels

Read more

Summary

Introduction

High-throughput DNA sequencing technologies have generated vast amount of sequence data. One example is calling genetic variations such as SNPs or SVs from sequence data. Genomic deletions affect several aspects (called signatures) of the sequence reads mapped onto the given reference genome near the deletion site. Consider pairedend reads that are mapped near the deletion with two ends being to the different sides of a deletion Such read pair is called encompassing pair for the deletion. When a read overlaps the breakpoints of a deletion, the read consists of two parts that are not contiguous on the reference: the part proceeding the left breakpoint and part following the right breakpoint Such a read is called split read. It is mapped onto two discontinuous regions of the reference These signatures reveal different aspects of structural variations.

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call