Abstract
Nucleotide variants can cause functional changes by altering protein–RNA binding in various ways that are not easy to predict. This can affect processes such as splicing, nuclear shuttling, and stability of the transcript. Therefore, correct modeling of protein–RNA binding is critical when predicting the effects of sequence variations. Many RNA-binding proteins recognize a diverse set of motifs and binding is typically also dependent on the genomic context, making this task particularly challenging. Here, we present DeepCLIP, the first method for context-aware modeling and predicting protein binding to RNA nucleic acids using exclusively sequence data as input. We show that DeepCLIP outperforms existing methods for modeling RNA-protein binding. Importantly, we demonstrate that DeepCLIP predictions correlate with the functional outcomes of nucleotide variants in independent wet lab experiments. Furthermore, we show how DeepCLIP binding profiles can be used in the design of therapeutically relevant antisense oligonucleotides, and to uncover possible position-dependent regulation in a tissue-specific manner. DeepCLIP is freely available as a stand-alone application and as a webtool at http://deepclip.compbio.sdu.dk.
Highlights
The massive technological progress in generation sequencing (NGS) technologies has made sequencing affordable in the context of precision medicine and personalized health
Models are created by training a network on a set of known binding sites and a set of background genomic sequences (Figure S1d,e), which can optionally be generated by DeepCLIP by providing binding locations instead of raw binding sequences
We found that DeepCLIP was the overall best classifier in every pair-wise comparison and when looking at the mean Area Under Receiver Operator Curve (AUROC) score, underscoring that DeepCLIP performs well on a broad set of data
Summary
The massive technological progress in generation sequencing (NGS) technologies has made sequencing affordable in the context of precision medicine and personalized health. NGS analysis enables identification of millions of sequence variants in each patient sample, increasing the need for in silico prediction of the functional consequences of a diverse range of variations. The effect of deep intronic sequence variants at the mRNA level through altered binding to RNA-binding proteins (RBPs) is difficult to predict in silico as existing tools’ predictions of functional outcomes of splicing are primarily based on the analysis of point mutations within or near exons[1,2,3]. Improving information on whether contexts act positively or negatively with regard to binding is an important area of research that will enable the development of novel therapeutic options in personalized medicine
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.