Abstract

Translation initiation site (TIS) prediction is one of the most crucial subtasks for gene annotation. Many computational methods have been proposed and achieved acceptable accuracy in transcripts (e.g., cDNA, mRNA). However, the prediction of TIS at the genome level is far more challenging and the computational methods for TIS prediction in genomic sequences so far reach modest performance. Recently, we proposed a method that improves the prediction of TIS in mRNA sequences and demonstrated the significance of explicitly modeling coding features. In this paper, we extend the same results to genomic sequence and present a two-stage deep learning model for TIS prediction in genomic sequence: the first stage to extract coding contrast features around TIS by a hybrid Convolutional Neural Network-Bidirectional Recurrent Neural Network architecture (Content-RCNN), and the second stage to integrate coding contrast features around TIS with TIS sequence encoded by one-hot encoding to jointly predict TIS by a CNN (Integrated-CNN). Four-fold cross validation tests on genome-wide human and mouse datasets demonstrate that the proposed model yields an improved prediction performance of TIS over existing state-of-the-art methods. The source code and the dataset used in the paper are publicly available at: https://github.com/xdcwei/DeepTIS/.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.