Abstract
BackgroundNumerous studies have demonstrated that long non-coding RNAs are related to plenty of human diseases. Therefore, it is crucial to predict potential lncRNA-disease associations for disease prognosis, diagnosis and therapy. Dozens of machine learning and deep learning algorithms have been adopted to this problem, yet it is still challenging to learn efficient low-dimensional representations from high-dimensional features of lncRNAs and diseases to predict unknown lncRNA-disease associations accurately.ResultsWe proposed an end-to-end model, VGAELDA, which integrates variational inference and graph autoencoders for lncRNA-disease associations prediction. VGAELDA contains two kinds of graph autoencoders. Variational graph autoencoders (VGAE) infer representations from features of lncRNAs and diseases respectively, while graph autoencoders propagate labels via known lncRNA-disease associations. These two kinds of autoencoders are trained alternately by adopting variational expectation maximization algorithm. The integration of both the VGAE for graph representation learning, and the alternate training via variational inference, strengthens the capability of VGAELDA to capture efficient low-dimensional representations from high-dimensional features, and hence promotes the robustness and preciseness for predicting unknown lncRNA-disease associations. Further analysis illuminates that the designed co-training framework of lncRNA and disease for VGAELDA solves a geometric matrix completion problem for capturing efficient low-dimensional representations via a deep learning approach.ConclusionCross validations and numerical experiments illustrate that VGAELDA outperforms the current state-of-the-art methods in lncRNA-disease association prediction. Case studies indicate that VGAELDA is capable of detecting potential lncRNA-disease associations. The source code and data are available at https://github.com/zhanglabNKU/VGAELDA.
Highlights
Long non-encoding RNA (LncRNA) are RNAs longer than 200 nucleotides losing the function of encoding, while they can still influence a series of biological processes, such as gene transcription, cell apoptosis, hormonal regulation, and immune response
VGAELDA has the following advantages. (i) variational graph autoencoder (VGAE) is preferable to infer low-dimensional representations from high-dimensional features in a graph, and these representations can better depict similarities and dependencies among nodes. This would significantly enhance the robustness and preciseness of prediction without handcrafted feature similarities. (ii) VGAELDA implements the variational Expectation maximization (EM) algorithm as a representation learning framework, by training the feature inference autoencoder and the label propagation autoencoder alternately. (iii) VGAELDA provides a useful solution to the geometric matrix completion problem via deep learning, because autoencoders tend to minimize the rank of outputs, and we suggest that manifold regularization can be obtained via the alternate training of two graph autoencoders. (iv) VGAELDA implements an efficient way to integrate information from lncRNA space and disease space
Experiments illustrate that VGAELDA is superior to the current state-of-the-art methods, and case studies on several diseases illustrate the capability of VGAELDA to detect new lncRNA-disease associations
Summary
LncRNAs are RNAs longer than 200 nucleotides losing the function of encoding, while they can still influence a series of biological processes, such as gene transcription, cell apoptosis, hormonal regulation, and immune response. LncRNAs are closely linked to plenty of human diseases [1,2,3]. It is essential to predict potential lncRNA-disease associations for disease prevention, detection, diagnosis and treatment. There are only a small number of lncRNA-disease associations that have been discovered so far, and it would be ideal to predict more potential lncRNA-disease associations using computational approaches. Computational methods, especially machine learning algorithms, are more time-efficient and cost-effective to detect potential lncRNA-disease associations compared with experimental methods. Dozens of machine learning and deep learning algorithms have been adopted to this problem, yet it is still challenging to learn efficient low-dimensional representations from high-dimensional features of lncRNAs and diseases to predict unknown lncRNA-disease associations accurately
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.