Abstract

Computational manipulation of knowledge is an important, and often under-appreciated, aspect of biomedical Data Science. The first Data Science initiative from the US National Institutes of Health was entitled “Big Data to Knowledge (BD2K).” The main emphasis of the more than $200M allocated to that program has been on “Big Data;” the “Knowledge” component has largely been the implicit assumption that the work will lead to new biomedical knowledge. However, there is long-standing and highly productive work in computational knowledge representation and reasoning, and computational processing of knowledge has a role in the world of Data Science.Knowledge-based biomedical Data Science involves the design and implementation of computer systems that act as if they knew about biomedicine. There are many ways in which a computational approach might act as if it knew something: for example, it might be able to answer a natural language question about a biomedical topic, or pass an exam; it might be able to use existing biomedical knowledge to rank or evaluate hypotheses; it might explain or interpret data in light of prior knowledge, either in a Bayesian or other sort of framework. These are all examples of automated reasoning that act on computational representations of knowledge. After a brief survey of existing approaches to knowledge-based data science, this position paper argues that such research is ripe for expansion, and expanded application.

Highlights

  • Computational manipulation of knowledge is an important, and often under-appreciated, aspect of biomedical Data Science

  • The main emphasis of the more than $200M allocated to that program has been on “Big Data;” the “Knowledge” component has largely been the implicit assumption that the work will lead to new biomedical knowledge

  • There is long-standing and highly productive work in computational knowledge representation and reasoning, and computational processing of knowledge has a role in the world of Data Science

Read more

Summary

Representations of biomedical knowledge

All computational approaches to knowledge require specification of how the computer system represents knowledge internally, and how it might compute with those representations to produce outputs (often called, perhaps metaphorically, reasoning). [16] focus on what ontological commitments a knowledge representation makes, what inferences are possible with it, and, sometimes, which of those inferences can be made efficiently. These issues remain useful in thinking about how knowledge representation and reasoning play a role in today’s data science environment. Building on decades of work in artificial intelligence research, the W3C produced a collection of international standards for assembling ontological entities into assertions and managing collections of assertions, together referred to as the Semantic Web. The focus of the Semantic Web standards is to make it possible to link web elements with shared meaning, and is sometimes described as the Linked Data paradigm. While the Semantic Web standards are intended to be general representation tools for all knowledge (e.g. RDF for facilitating exchange of research data), the combination of Semantic Web standards and biomedical ontologies are the basis of most current biomedical knowledge representation systems

Knowledge-based inference
Logical inference
Inference from ontology annotation
Inference from the biomedical literature
Open challenges in knowledge-based Data Science
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call