Abstract

The profusion of high-throughput instruments and the explosion of new results in the scientific literature, particularly in molecular biomedicine, is both a blessing and a curse to the bench researcher. Even knowledgeable and experienced scientists can benefit from computational tools that help navigate this vast and rapidly evolving terrain. In this paper, we describe a novel computational approach to this challenge, a knowledge-based system that combines reading, reasoning, and reporting methods to facilitate analysis of experimental data. Reading methods extract information from external resources, either by parsing structured data or using biomedical language processing to extract information from unstructured data, and track knowledge provenance. Reasoning methods enrich the knowledge that results from reading by, for example, noting two genes that are annotated to the same ontology term or database entry. Reasoning is also used to combine all sources into a knowledge network that represents the integration of all sorts of relationships between a pair of genes, and to calculate a combined reliability score. Reporting methods combine the knowledge network with a congruent network constructed from experimental data and visualize the combined network in a tool that facilitates the knowledge-based analysis of that data. An implementation of this approach, called the Hanalyzer, is demonstrated on a large-scale gene expression array dataset relevant to craniofacial development. The use of the tool was critical in the creation of hypotheses regarding the roles of four genes never previously characterized as involved in craniofacial development; each of these hypotheses was validated by further experimental work.

Highlights

  • Human knowledge relevant to biomedical research is expanding at an exponential pace

  • The journal Nucleic Acids Research publishes an annual compendium of peer-reviewed databases relevant to molecular biology; the 2008 issue reported on 1,078 such databases [3]

  • The computer program described in this paper ‘‘reads’’ the biomedical literature and molecular biology databases, ‘‘reasons’’ about what all that information means to this experiment, and ‘‘reports’’ on its findings in a way that makes digesting all of this information far more efficient than ever before possible

Read more

Summary

Introduction

Over the last twenty years, more than 10 million publications have been indexed by the National Library of Medicine (NLM) and made available through PubMed, reflecting a compounded annual growth rate of more than 4.8% [1,2]. Structured knowledge, in the form of molecular biology relevant databases, has been growing at an impressive rate. The journal Nucleic Acids Research publishes an annual compendium of peer-reviewed databases relevant to molecular biology; the 2008 issue reported on 1,078 such databases [3]. [4; figure 1] demonstrated that nearly 40% of the more than 5,000 journals indexed in PubMed in a typical year contained at least one assertion regarding protein transport, interaction or expression that could be found by a text mining system

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.