Abstract

RNA-binding proteins (RBPs) have a significant role in various regulatory tasks. However, the mechanism by which RBPs identify the subsequence target RNAs is still not clear. In recent years, several machine and deep learning-based computational models have been proposed for understanding the binding preferences of RBPs. These methods required integrating multiple features with raw RNA sequences such as secondary structure and their performances can be further improved. In this paper, we propose an efficient and simple convolution neural network, RBPCNN, that relies on the combination of the raw RNA sequence and evolutionary information. We show that conservation scores (evolutionary information) for the RNA sequences can significantly improve the overall performance of the proposed predictor. In addition, the automatic extraction of the binding sequence motifs can enhance our understanding of the binding specificities of RBPs. The experimental results show that RBPCNN outperforms significantly the current state-of-the-art methods. More specifically, the average area under the receiver operator curve was improved by 2.67 percent and the mean average precision was improved by 8.03 percent. The datasets and results can be downloaded from https://home.jbnu.ac.kr/NSCL/RBPCNN.htm.

Highlights

  • RNA binding site or binding motif is a subsequence of RNA where the binding between the RNA-binding proteins (RBPs) and its RNA subsequence targets take place

  • We study the performance of the proposed model RBPCNN and compare it with the state-of-the-artmodels

  • The box plots for the 31 experiments for the proposed model RBPCNN and the competing methods are shown in Fig. 6 for AUC and Fig. 7 for average precision score (AP)

Read more

Summary

INTRODUCTION

RNA binding site or binding motif is a subsequence of RNA where the binding between the RBP and its RNA subsequence targets take place. The DeepCpG model was proposed to study CpG sites [36] All of these successful examples have proven that deep learning can effectively extract the features automatically from raw genomic sequences and provide better outcomes in terms of prediction and analysis. Different deep learning-based models have been proposed for RNA protein binding sites prediction. A hybrid deep learning model was proposed in [44] for RNA binding sites prediction using a codon encoding method. Different representations such as motifs and RNA structures were used in deep belief network and CNN models for generating shared representation.

Materials
Encoding Sequence and Evolutionary Information
The Proposed Model
SÀ1 NXÀ1
RESULTS
Method
The Learned Motifs
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call