Abstract

BackgroundsA large number of long intergenic non-coding RNAs (lincRNAs) are linked to a broad spectrum of human diseases. The disease association with many other lincRNAs still remain as puzzle. Validation of such links between the two entities through biological experiments are expensive. However, a plethora lincRNA-data are available now, thanks to the High Throughput Sequencing (HTS) platforms, Genome Wide Association Studies (GWAS), etc, which opens the opportunity for cutting-edge machine learning and data mining approaches to extract meaningful relationships among lincRNAs and diseases. However, there are only a few in silico lincRNA-disease association inference tools available to date, and none of them utilizes side information of both the entities simultaneously in a single framework.MethodsThe recently developed Inductive Matrix Completion (IMC) technique provides a recommendation platform among two entities considering respective side information about them. However, the formulation of IMC is incapable of handling noise and outliers that may be present in the datasets, while data sparsity consideration is another issue with the standard IMC method. Thus, a robust version of IMC is needed that can solve the two issues. As a remedy, in this paper, we propose Stable Robust Inductive Matrix Completion (SRIMC) that utilizes the l2,1 norm based regularization to optimize the objective function with a unique 2-step stable solution approach.ResultsWe applied SRIMC to the available association data between human lincRNAs and OMIM disease phenotypes as well as a diverse set of side information about the lincRNAs and the diseases. The method performs better than the state-of-the-art methods in terms of precision@k and recall@k at the top-k disease prioritization to the subject lincRNAs. We also demonstrate that SRIMC is equally effective for querying about novel lincRNAs, as well as predicting rank of a newly known disease for a set of well-characterized lincRNAs.ConclusionsWith the experimental results and computational evaluation, we show that SRIMC is robust in handling datasets with noise and outliers as well as dealing with novel lincRNAs and disease phenotypes.

Highlights

  • LincRNA-disease association inference problem It is a surprising fact that, only 2% of the entire human genome codes for proteins [1]

  • We demonstrate that Stable Robust Inductive Matrix Completion (SRIMC) is effective for querying about novel lincRNAs, as well as predicting rank of a newly known disease for a set of well-characterized lincRNAs

  • With the experimental results and computational evaluation, we show that SRIMC is robust in handling datasets with noise and outliers as well as dealing with novel lincRNAs and disease phenotypes

Read more

Summary

Introduction

LincRNA-disease association inference problem It is a surprising fact that, only 2% of the entire human genome codes for proteins [1]. It has become evident that the non-protein coding portion of the genome, especially the long intergenic non-coding RNAs (lincRNAs) having length more than 200 bases each with no overlaps with any annotated protein-coding regions, are of critical functional importance. These lincRNAs demonstrate diverse molecular mechanisms and implicate various human diseases [2]. Fully annotating the functions of the lincRNAs and their involvements in human disease implications still remain a challenge for the researchers. Developing machine learning algorithm to rank disease implications by a given lincRNA based on prior knowledge would be beneficial to the community for tackling the challenge

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call