Abstract

High-throughput experimental methods such as medical sequencing and genome-wide association studies (GWAS) identify increasingly large numbers of potential relations between genetic variants and diseases. Both biological complexity (millions of potential gene-disease associations) and the accelerating rate of data production necessitate computational approaches to prioritize and rationalize potential gene-disease relations. Here, we use concept profile technology to expose from the biomedical literature both explicitly stated gene-disease relations (the explicitome) and a much larger set of implied gene-disease associations (the implicitome). Implicit relations are largely unknown to, or are even unintended by the original authors, but they vastly extend the reach of existing biomedical knowledge for identification and interpretation of gene-disease associations. The implicitome can be used in conjunction with experimental data resources to rationalize both known and novel associations. We demonstrate the usefulness of the implicitome by rationalizing known and novel gene-disease associations, including those from GWAS. To facilitate the re-use of implicit gene-disease associations, we publish our data in compliance with FAIR Data Publishing recommendations [https://www.force11.org/group/fairgroup] using nanopublications. An online tool (http://knowledge.bio) is available to explore established and potential gene-disease associations in the context of other biomedical relations.

Highlights

  • Organizing the knowledge space around disease pathophysiology and phenotypes is crucial for data interpretation [1,2] and translational medicine [3]

  • There are three steps to the analysis: (1) We index the biomedical terminology in a large corpus of MEDLINE abstracts using an extensive thesaurus, mapping and disambiguating terms to over 600,000 biomedical concepts; (2) For each doi:10.1371/journal.pone.0149621.g001

  • Of the 417,561,711 possible gene-disease pairs (19,113 genes x 21,847 diseases in our thesaurus) more than half (213,489,335) lacked sufficient literature representation to build a concept profile for either one or both of the concepts in the pair

Read more

Summary

Introduction

Organizing the knowledge space around disease pathophysiology and phenotypes is crucial for data interpretation [1,2] and translational medicine [3]. Most geneticists attempt to rationalize large and complex datasets using keyword-based searches to interrogate existing knowledge resources. In contrast to keyword approaches, we use concept profiles to expose the associative information contained in MEDLINE [http://www.ncbi.nlm.nih.gov/pubmed] abstracts as a document-independent, weighted semantic network of disambiguated biomedical concepts. From this network we expose all gene-disease associations forming a “Literature Wide Association Study” or LWAS. The vast majority of gene-disease associations are implicit, that is, they are associated by their mutual association to intermediate concepts. We call the network of implicit relations the gene-disease implicitome

Methods
Results
Discussion
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.