DeepTIS: Improved translation initiation site prediction in genomic sequence via a two-stage deep learning model

Chao Wei,Junying Zhang,Yuan Xiguo

doi:10.1016/j.dsp.2021.103202

Abstract

Translation initiation site (TIS) prediction is one of the most crucial subtasks for gene annotation. Many computational methods have been proposed and achieved acceptable accuracy in transcripts (e.g., cDNA, mRNA). However, the prediction of TIS at the genome level is far more challenging and the computational methods for TIS prediction in genomic sequences so far reach modest performance. Recently, we proposed a method that improves the prediction of TIS in mRNA sequences and demonstrated the significance of explicitly modeling coding features. In this paper, we extend the same results to genomic sequence and present a two-stage deep learning model for TIS prediction in genomic sequence: the first stage to extract coding contrast features around TIS by a hybrid Convolutional Neural Network-Bidirectional Recurrent Neural Network architecture (Content-RCNN), and the second stage to integrate coding contrast features around TIS with TIS sequence encoded by one-hot encoding to jointly predict TIS by a CNN (Integrated-CNN). Four-fold cross validation tests on genome-wide human and mouse datasets demonstrate that the proposed model yields an improved prediction performance of TIS over existing state-of-the-art methods. The source code and the dataset used in the paper are publicly available at: https://github.com/xdcwei/DeepTIS/.

Full Text