Predicting protein-binding regions in RNA using nucleotide profiles and compositions

Daesik Choi,Kyungsook Han,Wook Lee,Byungkyu Park,Hanju Chae

doi:10.1186/s12918-017-0386-4

Daesik Choi, Kyungsook Han + Show 3 more

Open Access

https://doi.org/10.1186/s12918-017-0386-4

Copy DOI

Journal: BMC Systems Biology	Publication Date: Mar 1, 2017
Citations: 18	License type: open-access

Affiliation: Inha University

Abstract

BackgroundMotivated by the increased amount of data on protein-RNA interactions and the availability of complete genome sequences of several organisms, many computational methods have been proposed to predict binding sites in protein-RNA interactions. However, most computational methods are limited to finding RNA-binding sites in proteins instead of protein-binding sites in RNAs. Predicting protein-binding sites in RNA is more challenging than predicting RNA-binding sites in proteins. Recent computational methods for finding protein-binding sites in RNAs have several drawbacks for practical use.ResultsWe developed a new support vector machine (SVM) model for predicting protein-binding regions in mRNA sequences. The model uses sequence profiles constructed from log-odds scores of mono- and di-nucleotides and nucleotide compositions. The model was evaluated by standard 10-fold cross validation, leave-one-protein-out (LOPO) cross validation and independent testing. Since actual mRNA sequences have more non-binding regions than protein-binding regions, we tested the model on several datasets with different ratios of protein-binding regions to non-binding regions. The best performance of the model was obtained in a balanced dataset of positive and negative instances. 10-fold cross validation with a balanced dataset achieved a sensitivity of 91.6%, a specificity of 92.4%, an accuracy of 92.0%, a positive predictive value (PPV) of 91.7%, a negative predictive value (NPV) of 92.3% and a Matthews correlation coefficient (MCC) of 0.840. LOPO cross validation showed a lower performance than the 10-fold cross validation, but the performance remains high (87.6% accuracy and 0.752 MCC). In testing the model on independent datasets, it achieved an accuracy of 82.2% and an MCC of 0.656. Testing of our model and other state-of-the-art methods on a same dataset showed that our model is better than the others.ConclusionsSequence profiles of log-odds scores of mono- and di-nucleotides were much more powerful features than nucleotide compositions in finding protein-binding regions in RNA sequences. But, a slight performance gain was obtained when using the sequence profiles along with nucleotide compositions. These are preliminary results of ongoing research, but demonstrate the potential of our approach as a powerful predictor of protein-binding regions in RNA. The program and supporting data are available at http://bclab.inha.ac.kr/RBPbinding.

Highlights

Motivated by the increased amount of data on protein-RNA interactions and the availability of complete genome sequences of several organisms, many computational methods have been proposed to predict binding sites in protein-RNA interactions
MPWM and di-nucleotide position weight matrix (dPWM) were much better than nucleotide compositions
With mono-nucleotide position weight matrix (mPWM) or dPWM alone, the support vector machine (SVM) model achieved an accuracy above 89% and an Matthews correlation coefficient (MCC) above 0.79

Summary

Introduction

Motivated by the increased amount of data on protein-RNA interactions and the availability of complete genome sequences of several organisms, many computational methods have been proposed to predict binding sites in protein-RNA interactions. Recent computational methods for finding protein-binding sites in RNAs have several drawbacks for practical use. Recent advances in high-throughput experimental technologies, including next-generation sequencing technologies and crosslinking and immunoprecipitation (CLIP), have accelerated the discovery of RBPs and their target RNAs. Despite the increased number of known RBPs and their target RNAs, the mechanism of protein-RNA interactions is not fully uncovered and a large number of RBPs and their target RNAs remain to be uncovered. As a complement to experimental methods, several computational methods have been proposed, which are largely motivated by the increased amount of data on protein-RNA interactions and the availability of complete genome sequences of several organisms. Computational methods in general are much less time-consuming and costly than experimental methods

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Predicting protein-binding regions in RNA using nucleotide profiles and compositions

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Systems Biology

Lead the way for us

Similar Papers

Predicting protein-binding RNA nucleotides with consideration of binding partners
Narankhuu Tuvshinjargal ... Kyungsook Han
Computer Methods and Programs in Biomedicine | VOL. 120
Narankhuu Tuvshinjargal, et. al.Narankhuu Tuvshinjargal ... Kyungsook Han
08 Apr 2015
Computer Methods and Programs in Biomedicine | VOL. 120

Data of protein-RNA binding sites
Wook Lee ... Kyungsook Han
Data in Brief | VOL. 10
Wook Lee, et. al.Wook Lee ... Kyungsook Han
29 Dec 2016
Data in Brief | VOL. 10

A Novel Saturation Mutagenesis Approach: Single Step Characterization of Regulatory Protein Binding Sites in RNA Using Phosphorothioates.
Ravinder Singh
Journal of visualized experiments : JoVE | VOL. -
Ravinder SinghRavinder Singh
21 Aug 2018
Journal of visualized experiments : JoVE | VOL. -

PRINTR: Prediction of RNA binding sites in proteins using SVM and profiles
Y Wang ... G Shen
Amino Acids | VOL. 35
Y Wang, et. al.Y Wang ... G Shen
31 Jan 2008
Amino Acids | VOL. 35

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Predicting protein-binding regions in RNA using nucleotide profiles and compositions

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Systems Biology