Multi-resBind: a residual network-based multi-label classifier for in vivo RNA binding prediction and preference visualization

Shitao Zhao,Michiaki Hamada

doi:10.1186/s12859-021-04430-y

Abstract

BackgroundProtein-RNA interactions play key roles in many processes regulating gene expression. To understand the underlying binding preference, ultraviolet cross-linking and immunoprecipitation (CLIP)-based methods have been used to identify the binding sites for hundreds of RNA-binding proteins (RBPs) in vivo. Using these large-scale experimental data to infer RNA binding preference and predict missing binding sites has become a great challenge. Some existing deep-learning models have demonstrated high prediction accuracy for individual RBPs. However, it remains difficult to avoid significant bias due to the experimental protocol. The DeepRiPe method was recently developed to solve this problem via introducing multi-task or multi-label learning into this field. However, this method has not reached an ideal level of prediction power due to the weak neural network architecture.ResultsCompared to the DeepRiPe approach, our Multi-resBind method demonstrated substantial improvements using the same large-scale PAR-CLIP dataset with respect to an increase in the area under the receiver operating characteristic curve and average precision. We conducted extensive experiments to evaluate the impact of various types of input data on the final prediction accuracy. The same approach was used to evaluate the effect of loss functions. Finally, a modified integrated gradient was employed to generate attribution maps. The patterns disentangled from relative contributions according to context offer biological insights into the underlying mechanism of protein-RNA interactions.ConclusionsHere, we propose Multi-resBind as a new multi-label deep-learning approach to infer protein-RNA binding preferences and predict novel interactions. The results clearly demonstrate that Multi-resBind is a promising tool to predict unknown binding sites in vivo and gain biology insights into why the neural network makes a given prediction.

Highlights

Protein-RNA interactions play key roles in many processes regulating gene expression
We evaluated the performance of Multi-resBind in terms of the area under the receiver operating characteristic curve (AUROC) and average precision (AP) relative to those of DeepRiPe using the same large-scale PAR-crosslinking and immunoprecipitation (CLIP) datasets
Some region information was lost to equalize to the length of input sequences, our method still outperformed DeepRiPe for each RNA-binding protein (RBP), as shown in Fig. 2c, d. These results suggested that using residual network (ResNet) with the skipped connection to build deeper networks could be a more potent approach to solve prediction problems in related biological research fields

Summary

Introduction

Protein-RNA interactions play key roles in many processes regulating gene expression. To understand the underlying binding preference, ultraviolet crosslinking and immunoprecipitation (CLIP)-based methods have been used to identify the binding sites for hundreds of RNA-binding proteins (RBPs) in vivo. Using these large-scale experimental data to infer RNA binding preference and predict missing binding sites has become a great challenge. The DeepRiPe method was recently developed to solve this problem via introducing multi-task or multi-label learning into this field This method has not reached an ideal level of prediction power due to the weak neural network architecture. The combined ultraviolet crosslinking and immunoprecipitation with sequencing (CLIP-seq) method was developed to measure genome-wide protein-RNA interactions in different cellular environments [9,10,11,12]. RNase T1 enzyme-inducing sequence bias is a common limitation in many CLIP-seq experiments [13]

Methods

Results

Discussion

Conclusion