Abstract

Accumulated evidence of biological clinical trials has shown that long non-coding RNAs (lncRNAs) are closely related to the occurrence and development of various complex human diseases. Research works on lncRNA–disease relations will benefit to further understand the pathogenesis of human complex diseases at the molecular level, but only a small proportion of lncRNA–disease associations has been confirmed. Considering the high cost of biological experiments, exploring potential lncRNA–disease associations with computational approaches has become very urgent. In this study, a model based on closest node weight graph of the spatial neighborhood (CNWGSN) and edge attention graph convolutional network (EAGCN), LDA-EAGCN, was developed to uncover potential lncRNA–disease associations by integrating disease semantic similarity, lncRNA functional similarity, and known lncRNA–disease associations. Inspired by the great success of the EAGCN method on the chemical molecule property recognition problem, the prediction of lncRNA–disease associations could be regarded as a component recognition problem of lncRNA–disease characteristic graphs. The CNWGSN features of lncRNA–disease associations combined with known lncRNA–disease associations were introduced to train EAGCN, and correlation scores of input data were predicted with EAGCN for judging whether the input lncRNAs would be associated with the input diseases. LDA-EAGCN achieved a reliable AUC value of 0.9853 in the ten-fold cross-over experiments, which was the highest among five state-of-the-art models. Furthermore, case studies of renal cancer, laryngeal carcinoma, and liver cancer were implemented, and most of the top-ranking lncRNA–disease associations have been proven by recently published experimental literature works. It can be seen that LDA-EAGCN is an effective model for predicting potential lncRNA–disease associations. Its source code and experimental data are available at https://github.com/HGDKMF/LDA-EAGCN.

Highlights

  • Long non-coding RNAs are a large and important class of non-coding RNAs with a molecular length more than 20 nucleotides (Ponting et al, 2009)

  • In order to better train the LDA-edge attention graph convolutional network (EAGCN) model, the random walk with restart (RWRH) algorithm was used to generate negative samples for training the prediction model based on heterogeneous networks in the study by Li and Patra (2010). This model sorts the possibilities of all associations according to the network structures and screens long non-coding RNAs (lncRNAs)–disease pairs with low correlation scores as negative samples

  • A model based on close node weight graph of the spatial neighborhood and edge attention graph convolutional networks was proposed to predict disease-related lncRNAs by multisource data

Read more

Summary

Introduction

Long non-coding RNAs (lncRNAs) are a large and important class of non-coding RNAs with a molecular length more than 20 nucleotides (Ponting et al, 2009). Many computational models based on integrating a vast amount of heterogeneous biological data have been proposed to predict novel lncRNA–disease associations. They can be categorized into two types. Yang et al (2014) developed a coding–non-coding gene–disease bipartite network based on the known associations between diseases and disease-causing genes, and applied a propagation algorithm mining 768 potential lncRNA–disease associations in the constructed network. Sun et al (2014) proposed a global network–based model, RWRlncD, which inferred lncRNA–disease associations with the random walk with a restart algorithm of the lncRNA functional similarity network. It is still challenging to predict potential lncRNA–disease associations accurately in the absence of the known lncRNA–disease association information

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.