Abstract

Long intergenic non-coding RNAs (lincRNAs) are associated with a wide variety of human diseases. Piles of data about the lincRNAs are becoming available, thanks to the High Throughput Sequencing (HTS) platforms, which open opportunity for cutting-edge machine learning and data mining approaches to analyze the disease association better. However, there are only a few in silico association inference tools available to date, and none of them utilizes the heterogeneous data about the lincRNAs and diseases. The standard Inductive Matrix Completion (IMC) technique provides with a platform among the two entities considering respective side information. But, it has two major issues pertaining to the noise and sparsity in the dataset. Thus, a robust version of IMC is needed to adequately address the issues. In this paper, we propose Robust Inductive Matrix Completion (RIMC) to address these challenges. Then, we applied RIMC to the available association dataset between the lincRNAs and OMIM disease phenotypes with a diverse set of side information of the both. The proposed method performs better than the state-of-the-art methods in terms of precision@k and recall@k at the top-k disease prioritization to the subject lincRNAs. Moreover, with an induction experiment we showed that RIMC performs superior than the standard IMC for ranking unexplored disease phenotypes to a set of known lincRNAs.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call