Abstract

Long non-coding RNA (lncRNA) plays an important role in cells through the interaction with RNA-binding proteins (RBPs). Finding the RBPs binding sites on the lncRNA chains can help to understand the post-transcriptional regulatory mechanism, exploring the pathogenesis of cancers and possible roles in other diseases. Although many genome-wide RBP experimental techniques can identify the RNA-protein interactions and detect the binding sites on RNA chains, they are still time-consuming, labor-intensive and cost-heavy. Thus, many computational methods have been developed to predict the RBPs sites by integrating the RNA sequence, structure and domain specific features, etc. However, current approaches that focus on predicting the RBPs binding sites on RNA chains lack a consideration of the dependencies among nucleotides. In this work, we propose a higher-order nucleotide encoding convolutional neural network-based method (namely HOCNNLB) to predict the RBPs binding sites on lncRNA chains. HOCNNLB first employs a high-order one-hot encoding strategy to encode the lncRNA sequences by considering the dependence among nucleotides, then the encoded lncRNA sequences are fed into the convolutional neural network (CNN) to predict the RBP binding sites. We evaluate HOCNNLB on 31 experimental datasets of 12 lncRNA binding proteins. The average AUC of HOCNNLB achieves 0.953, which is 0.247, 0.175 higher than that of iDeepS and DeepBind, respectively. The average accuracy is 90.2%, which is 26.8%, 19.5% higher than that of iDeepS and DeepBind, respectively. These results demonstrate that HOCNNLB can reliably predict the RBP binding sites on lncRNA chains and outperforms the state-of-the-art methods. The source code of HOCNNLB and the datasets used in this work are available at https://github.com/NWPU-903PR/HOCNNLB for academic users.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call