Abstract
SummaryIdentifying the factors determining the RBP-RNA interactions remains a big challenge. It involves sparse binding motifs and a suitable sequence context for binding. The present work describes an approach to detect RBP binding sites in RNAs using an ultra-fast inexact k-mers search for statistically significant seeds. The seeds work as an anchor to evaluate the context and binding potential using flanking region information while leveraging from Deep Feed-forward Neural Network. The developed models also received support from MD-simulation studies. The implemented software, RBPSpot, scored consistently high for all the performance metrics including average accuracy of ∼90% across a large number of validated datasets. It outperformed the compared tools, including some with much complex deep-learning models, during a comprehensive benchmarking process. RBPSpot can identify RBP binding sites in the human system and can also be used to develop new models, making it a valuable resource in the area of regulatory system studies.
Highlights
The seeds work as an anchor to evaluate the context and binding potential using flanking region information while leveraging from Deep Feed-forward Neural Network
Advances in high-throughput techniques like CLIP-seq and Interactome Capture have drastically revised our understanding about RBPs which suggest that human systems are expected to have at least 1,500– 2,000 genes coding for RBPs (Gerstberger et al, 2014; Castello et al, 2015)
If trained with carefully selected properties, are capable of outperforming complex models which typically work better under unstructured data conditions (Ryan 2021, https://towardsdatascience.com/the-unreasonable-ineffectiveness-of-deep-learning-ontabular-data-fd784ea29c33). Considering all these factors, here we present an efficient Deep Neural Net (DNN) based approach to build the mechanistic models of RBP-RNA interactions using high-throughput cross-linking data while considering data for 137 human RBPs from 99 experiments
Summary
Advances in high-throughput techniques like CLIP-seq and Interactome Capture have drastically revised our understanding about RBPs which suggest that human systems are expected to have at least 1,500– 2,000 genes coding for RBPs (Gerstberger et al, 2014; Castello et al, 2015). Using general motif discovery tools to identify the interaction spots have provided limited success in the case of RBPs as they either report too short motifs which have high chances of occurrences across the random data or they do not cover large spectra of instances. Contextual sequence environment guide the RBP-RNA interactions, adding further complexity to the process of discovery of the actual interaction spots. This is an area which needs prime focus on deriving the principles of RBP-RNA interactions and their impact on regulation once we have enough CLIP-seq data. Some decent progress has been made to derive the models for interactions
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.