Abstract
Long non-coding RNAs (LncRNA) are critical regulators for biological processes, which are highly related to complex diseases. Even though the next generation sequence technology facilitates the discovery of a great number of lncRNAs, the knowledge about the functions of lncRNAs is limited. Thus, it is promising to predict the functions of lncRNAs, which shed light on revealing the mechanisms of complex diseases. The current algorithms predict the functions of lncRNA by using the features of protein-coding genes. Generally speaking, these algorithms fuse heterogeneous genomic data to construct lncRNA-gene associations via a linear combination, which cannot fully characterize the function-lncRNA relations. To overcome this issue, we present an nonnegative matrix factorization algorithm with multiple partial regularization (aka MPrNMF) to predict the functions of lncRNAs without fusing the heterogeneous genomic data. In details, for each type of genomic data, we construct the lncRNA-gene associations, resulting in multiple associations. The proposed method integrates separately them via regularization strategy, rather than fuse them into a single type of associations. The results demonstrate that the proposed algorithm outperforms state-of-the-art methods based network-analysis. The model and algorithm provide an effective way to explore the functions of lncRNAs.
Highlights
Long non-coding RNAs are a type of non-coding RNAs with more than 200 nucleotides in length, which have very little or no potential to encode proteins (Mercer et al, 2009)
Let no be the number of ontological functions in Gene Ontology (GO), ng be the number of proteins in the Protein-Protein interaction (PPI) network, nl be the number of Long non-coding RNAs (lncRNAs) in the co-expression network
The gene-disease associations are downloaded from the OMIM database, while the lncRNA-disease associations are downloaded from the LncRNADisease database
Summary
Long non-coding RNAs (lncRNAs) are a type of non-coding RNAs with more than 200 nucleotides in length, which have very little or no potential to encode proteins (Mercer et al, 2009). In order to make use of the global information, Guo et al (2013) constructed a bi-colored network via integrating the expression profiles of lncRNA and genes, provided the lnc-GFP algorithm to predict the functions of lncRNAs. Jiang et al (2015) employed the statistical test to annotate the functions of lncRNAs. Recently, Zhang et al (2018) proposed the NeuralNetL2GO algorithm, which uses neural networks to annotate lncRNAs. there are many different genomic data to link the lncRNA and genes, for example gene co-expression, connection to the diseases, protein binding sites. The current algorithms integrate multiple heterogeneous genomic data into a single network via weighted or unweighted linear functions, which are criticized for not fully characterizing the links between lncRNAs and genes.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.