Abstract

It is known that over 98% of the human genome is non-coding, and 93% of disease associated variants are located in these regions. Therefore, understanding the function of these regions is important. However, this task is challenging as most of these regions are not well understood in terms of their functions. In this paper, we introduce a novel computational model based on deep neural networks, called DQDNN, for quantifying the function of non-coding DNA regions. This model combines convolution layers for capturing regularity motifs at multiple scales and recurrent layers for capturing long term dependencies between the captured motifs. In addition, we show that integrating evolutionary information with raw genomic sequences improves the performance of the predictor significantly. The proposed model outperforms the state-of-the-art ones using raw genomics sequences only and also by integrating evolutionary information with raw genomics sequences. More specifically, the proposed model improves 96.9% and 98% of the targets in terms of area under the receiver operating characteristic curve and the precision-recall curve, respectively. In addition, the proposed model improved the prioritization of functional variants of expression quantitative trait loci (eQTLs) compared with the state-of-the-art models.

Highlights

  • High throughput sequential data availability has attracted researchers to develop outstanding deep learning algorithms that can efficiently learn from large datasets [1]

  • A genome-wide association study revealed over 6500 trait predisposing single nucleotide polymorphisms (SNPs) or diseases where 93% of these diseases or SNPs are located in the non-coding regions [2]

  • The evaluation results show that the proposed model outperforms the current state-of-the-art models by using raw DNA sequences only and by integrating evolutionary information with raw DNA

Read more

Summary

Introduction

High throughput sequential data availability has attracted researchers to develop outstanding deep learning algorithms that can efficiently learn from large datasets [1]. Having a computational model that can predict the function of the non-coding DNA region from raw genomic data is important. Recent research aimed to predict the function of the genomic sequences directly from the raw genomics data instead of handcrafting the features In this regard, deep learning algorithms have produced remarkable results as they are able to learn automatically complex patterns from large datasets. Zhou and Troyanskaya proposed the DeepSEA model in which they utilized the convolution neural network for capturing the important motifs from the raw DNA sequences. Their proposed model was simple as it contained three consecutive convolution layers followed by fully connected layers for classification. The prioritization of functional variants of expression quantitative trait loci (eQTLs) was improved compared with the state-of-the-art models

Materials
The Proposed Model
Functional SNP Prioritization
The Performance of the DQDNN Model
The Performance of the Functional SNP Prioritization Model
Conclusions
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.