Abstract

Motivation: Next-generation sequencing (NGS) technologies using DNA, RNA, or methylation sequencing are prevailing tools used in modern genome research. For DNA sequencing, whole genome sequencing (WGS) and whole exome sequencing (WES) are two typical applications with a different preference on the trade-off between sequencing depth and base coverage. Although sequencing costs have been greatly reduced, the sequence depth used in WGS is relatively lower than WES (e.g., ∼35× vs. 100×∼). In addition, biases and batch effects may exist in different stages of a NGS experiment. Using low-depth and biased WGS data for downstream analyses is more sensitive to the bias problem and makes it even more difficult to uncover real biological signals in the data. In this work, we focused on reconstructing high read-depth signals from low-depth WGS data. We make use of a pair of WGS data with different read-depth for the same sample and learn a mapping from low-depth signals to high-depth in the given platform. Results: We explored three different reconstruction models from shallow to deep. Our experimental results show that by only using the read depth information, deeper models do not perform far better than a linear regression model. Through incorporating additional information, such as GC-content, mappability and nucleotide sequence information, the performance of convolutional neural network (CNN) models can be further improved. We made use of the reconstructed read-depth signals in downstream analysis to identify copy number variation segments for single sample. The experiment results show that segments that are not detected using low-depth data, can be detected with the reconstructed signals by the CNN model using extra biological information. Availability: The source code will be available at https://github.com/yaozhong/DLRec

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call