Orthogonal matrix factorization enables integrative analysis of multiple RNA binding proteins.

Martin Stražar,Tomaž Curk,Blaž Zupan,Jernej Ule,Marinka Žitnik

doi:10.1093/bioinformatics/btw003

Martin Stražar, Tomaž Curk + Show 3 more

Open Access

PDF Available

https://doi.org/10.1093/bioinformatics/btw003

Copy DOI

Export

Save

Cite

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

Motivation: RNA binding proteins (RBPs) play important roles in post-transcriptional control of gene expression, including splicing, transport, polyadenylation and RNA stability. To model protein–RNA interactions by considering all available sources of information, it is necessary to integrate the rapidly growing RBP experimental data with the latest genome annotation, gene function, RNA sequence and structure. Such integration is possible by matrix factorization, where current approaches have an undesired tendency to identify only a small number of the strongest patterns with overlapping features. Because protein–RNA interactions are orchestrated by multiple factors, methods that identify discriminative patterns of varying strengths are needed.Results: We have developed an integrative orthogonality-regularized nonnegative matrix factorization (iONMF) to integrate multiple data sources and discover non-overlapping, class-specific RNA binding patterns of varying strengths. The orthogonality constraint halves the effective size of the factor model and outperforms other NMF models in predicting RBP interaction sites on RNA. We have integrated the largest data compendium to date, which includes 31 CLIP experiments on 19 RBPs involved in splicing (such as hnRNPs, U2AF2, ELAVL1, TDP-43 and FUS) and processing of 3’UTR (Ago, IGF2BP). We show that the integration of multiple data sources improves the predictive accuracy of retrieval of RNA binding sites. In our study the key predictive factors of protein–RNA interactions were the position of RNA structure and sequence motifs, RBP co-binding and gene region type. We report on a number of protein-specific patterns, many of which are consistent with experimentally determined properties of RBPs.Availability and implementation: The iONMF implementation and example datasets are available at https://github.com/mstrazar/ionmf.Contact: tomaz.curk@fri.uni-lj.siSupplementary information: Supplementary data are available at Bioinformatics online.

Highlights

Reference and details about all experiments used in the study are listed
Depending on the experimental protocol used (PARCLIP, CLIPSEQ, iCLIP, HITSCLIP) we report number of cross-linking clusters and number of individual sites for each experiment used
>HSU14570 Human Alu-Sb2 subfamily consensus sequence. (rev. complement) TTTTTTTTGAGACGGAGTCTCGCTCTGTCGCCCAGGCCGGACTGCGGACTGCAGTGGCGCAATCTCGGCTCACTGCAAGCT TCCGCTTCCCGGGTTCACGCCATTCTCCTGCCTCAGCCTCCCCAGTAGCTGGGACTACAGGCGCCCGCCACCGCGCCCGG CTAATTTTTTGTATTTTTAGTAGAGACGGGGTTTCACCTTGTTAGCCAGGATGGTCTCGATCTCCTGACCTCATGATCCAC CCCGCCTCGGCCTCCCAAAGTGCTGGGATTACAGGCGTGAGCCACCGCGCCCGGCC

Summary

Derivation of update rules

The general matrix factorization problem can be solved with different optimization approaches; these include:. Non-convexity of the problem follows from observing that there exist equivalent solutions X = WUT UH, where U is any unitary matrix of appropriate size. To learn a factor model with iONMF we propose the following optimization problem with respect to W and Hi for i = 1, ..., N : N minW,H − 2Hi) − λi λi = −XTi W + HiWT W + α(2HiHTi Hi − Hi) To satisfy the Karush-Kuhn-Tucker optimality conditions at a stationary point we must have: which leads to the following update rules: Hi ◦ λi = 0 H2i ◦ (λ+i − λ−i ) = 0. This is exactly the update rule in Equation 3 (see main text)

Equivalence to gradient descent

Effect of orthogonality on predictive performance and model sparseness

Model parameters

Prediction accuracy for data source subsets on individual RBPs

Discovery of RNA motifs

Sequences of Alu elements bound and regulated by hnRNPC

Full Text

Published Version (Free)

View/Download pdf

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Bioinformatics	Publication Date: Jan 18, 2016
Citations: 115	License type: CC BY 4.0

R Discovery Prime

Orthogonal matrix factorization enables integrative analysis of multiple RNA binding proteins.

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: Bioinformatics

Lead the way for us

Similar Papers

Breaking the protein-RNA recognition code
Janosch Hennig ... Michael Sattler
Cell Cycle | VOL. 13
Janosch Hennig, et. al.Janosch Hennig ... Michael Sattler
01 Dec 2014
Cell Cycle | VOL. 13

Principles and Properties of Eukaryotic mRNPs
Sarah F Mitchell ... Roy Parker
Molecular Cell | VOL. 54
Sarah F Mitchell, et. al.Sarah F Mitchell ... Roy Parker
01 May 2014
Molecular Cell | VOL. 54

RNA regulation in immunity.
Heather Pua ... K Mark Ansel
Immunological reviews | VOL. 304
Heather Pua, et. al.Heather Pua ... K Mark Ansel
01 Nov 2021
Immunological reviews | VOL. 304

RNA-Binding Proteins in the Control of LPS-Induced Macrophage Response.
Dirk H Ostareck ... Antje Ostareck-Lederer
Frontiers in Genetics | VOL. 10
Dirk H Ostareck, et. al.Dirk H Ostareck ... Antje Ostareck-Lederer
04 Feb 2019
Frontiers in Genetics | VOL. 10

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

Orthogonal matrix factorization enables integrative analysis of multiple RNA binding proteins.

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: Bioinformatics