Abstract

In recent years, accumulating evidences have shown that the dysregulations of lncRNAs are associated with a wide range of human diseases. It is necessary and feasible to analyze known lncRNA-disease associations, predict potential lncRNA-disease associations, and provide the most possible lncRNA-disease pairs for experimental validation. Considering the limitations of traditional Random Walk with Restart (RWR), the model of Improved Random Walk with Restart for LncRNA-Disease Association prediction (IRWRLDA) was developed to predict novel lncRNA-disease associations by integrating known lncRNA-disease associations, disease semantic similarity, and various lncRNA similarity measures. The novelty of IRWRLDA lies in the incorporation of lncRNA expression similarity and disease semantic similarity to set the initial probability vector of the RWR. Therefore, IRWRLDA could be applied to diseases without any known related lncRNAs. IRWRLDA significantly improved previous classical models with reliable AUCs of 0.7242 and 0.7872 in two known lncRNA-disease association datasets downloaded from the lncRNADisease database, respectively. Further case studies of colon cancer and leukemia were implemented for IRWRLDA and 60% of lncRNAs in the top 10 prediction lists have been confirmed by recent experimental reports.

Highlights

  • For quite a long time, genetic information was considered to be only stored in protein-coding genes and RNA was just transcriptional noise and intermediary between a DNA sequence and its encoded protein [1,2,3,4,5]

  • leaveone-out cross validation (LOOCV) was implemented to evaluate the prediction performance of IRWRLDA based on two versions of lncRNA-disease association datasets (June-2012 Version and June-2014 Version) downloaded from lncRNADisease database

  • When LOOCV was implemented for the investigated disease, each known related lncRNA was left out in turn as a test sample and other known www.impactjournals.com/oncotarget related lncRNAs were regarded as training samples

Read more

Summary

Introduction

For quite a long time, genetic information was considered to be only stored in protein-coding genes and RNA was just transcriptional noise and intermediary between a DNA sequence and its encoded protein [1,2,3,4,5]. Guttman et al (2009) integrated gene expression data, the presence of chromatin marks for promoter regions and gene bodies, and the known annotations of coding transcripts to propose the first large-scale lncRNA discovery approach. In their studies, 1600 novel mouse large intervening non-coding RNAs (lincRNAs) across four mouse cell types have been discovered [23]. 1600 novel mouse large intervening non-coding RNAs (lincRNAs) across four mouse cell types have been discovered [23] Another important example of lncRNA discovery is that Cabili et al (2011) integrated chromatin marks and RNA-sequencing (RNA-seq) data to identify more than 8000 lincRNAs across 24 different human cell types and tissues [24]

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.