Abstract

In recent years, it has been increasingly clear that long noncoding RNAs (lncRNAs) play critical roles in many biological processes associated with human diseases. Inferring potential lncRNA-disease associations is essential to reveal the secrets behind diseases, develop novel drugs, and optimize personalized treatments. However, biological experiments to validate lncRNA-disease associations are very time-consuming and costly. Thus, it is critical to develop effective computational models. In this study, we have proposed a method called BPLLDA to predict lncRNA-disease associations based on paths of fixed lengths in a heterogeneous lncRNA-disease association network. Specifically, BPLLDA first constructs a heterogeneous lncRNA-disease network by integrating the lncRNA-disease association network, the lncRNA functional similarity network, and the disease semantic similarity network. It then infers the probability of an lncRNA-disease association based on paths connecting them and their lengths in the network. Compared to existing methods, BPLLDA has a few advantages, including not demanding negative samples and the ability to predict associations related to novel lncRNAs or novel diseases. BPLLDA was applied to a canonical lncRNA-disease association database called LncRNADisease, together with two popular methods LRLSLDA and GrwLDA. The leave-one-out cross-validation areas under the receiver operating characteristic curve of BPLLDA are 0.87117, 0.82403, and 0.78528, respectively, for predicting overall associations, associations related to novel lncRNAs, and associations related to novel diseases, higher than those of the two compared methods. In addition, cervical cancer, glioma, and non-small-cell lung cancer were selected as case studies, for which the predicted top five lncRNA-disease associations were verified by recently published literature. In summary, BPLLDA exhibits good performances in predicting novel lncRNA-disease associations and associations related to novel lncRNAs and diseases. It may contribute to the understanding of lncRNA-associated diseases like certain cancers.

Highlights

  • It is known that there are about 20,000 protein-coding genes, consisting of less than 2% of the human genome (Bertone et al, 2004; Claverie, 2005)

  • Many studies have demonstrated that long noncoding RNAs (lncRNAs) are essential in many physiological processes related to human diseases

  • The biological experiments to validate lncRNA-disease associations are time consuming and costly, which promotes the need for developing computational prediction models

Read more

Summary

Introduction

It is known that there are about 20,000 protein-coding genes, consisting of less than 2% of the human genome (Bertone et al, 2004; Claverie, 2005). Many recent studies have suggested that ncRNAs play key regulatory roles in many important biological processes such as cell proliferation (Esteller, 2011). Based on their sizes, ncRNAs can be divided into long ncRNAs (lncRNAs) (Pauli et al, 2011) and small ncRNAs such as microRNAs (miRNAs) (Farazi et al, 2013), transfer RNAs (tRNAs) (Birney et al, 2007), and Piwi-interacting RNAs (piRNAs) (Li et al, 2013). Compared to protein-coding, RNAs, lncRNAs are less conservative among species (Harrow et al, 2012; Cabili et al, 2016), and have a relatively low expression level, more tissue-specific patterns (Guttman et al, 2010), and longer but less exons (Chen, 2015). More and more lncRNAs have been identified in eukaryotes from nematodes to human beings due to the advancement in sequencing technologies and computational methods (Awan et al, 2017)

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call