Abstract

As a result of the advent of high-throughput technologies, there has been rapid progress in our understanding of the genetics underlying biological processes. However, despite such advances, the genetic landscape of human diseases has only marginally been disclosed. Exploiting the present availability of large amounts of biological and phenotypic data, we can use our current understanding of disease genetics to train machine learning models to predict novel genetic factors associated with the disease. To this end, we developed DGLinker, a webserver for the prediction of novel candidate genes for human diseases given a set of known disease genes. DGLinker has a user-friendly interface that allows non-expert users to exploit biomedical information from a wide range of biological and phenotypic databases, and/or to upload their own data, to generate a knowledge-graph and use machine learning to predict new disease-associated genes. The webserver includes tools to explore and interpret the results and generates publication-ready figures. DGLinker is available at https://dglinker.rosalind.kcl.ac.uk. The webserver is free and open to all users without the need for registration.

Highlights

  • Thanks to the establishment of high-throughput technologies as a common tool in the biomedical field, vast amounts of biological and phenotype information are currently available

  • In order to maximise its usability, DGLinker has a user-friendly interface that requires no informatics skills and provides a highly flexible analysis framework that gives the user control over the data used for the generation of the knowledge-graph and the training of the predictive model

  • The DGLinker pipeline consists of four main steps: (i) specification of known disease associated genes, (ii) selection of the data to generate the KG, (iii) Machine learning (ML) training and DG predictions, (iv) results visualization and evaluation (Figure 2)

Read more

Summary

Introduction

Thanks to the establishment of high-throughput technologies as a common tool in the biomedical field, vast amounts of biological and phenotype information are currently available. The DGLinker pipeline consists of four main steps: (i) specification of known disease associated genes, (ii) selection (and/or upload) of the data to generate the KG, (iii) ML training and DG predictions, (iv) results visualization and evaluation (Figure 2).

Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call