Abstract

Innovative biomedical librarians and information specialists who want to expand their roles as expert searchers need to know about profound changes in biology and parallel trends in text mining. In recent years, conceptual biology has emerged as a complement to empirical biology. This is partly in response to the availability of massive digital resources such as the network of databases for molecular biologists at the National Center for Biotechnology Information. Developments in text mining and hypothesis discovery systems based on the early work of Swanson, a mathematician and information scientist, are coincident with the emergence of conceptual biology. Very little has been written to introduce biomedical digital librarians to these new trends. In this paper, background for data and text mining, as well as for knowledge discovery in databases (KDD) and in text (KDT) is presented, then a brief review of Swanson's ideas, followed by a discussion of recent approaches to hypothesis discovery and testing. 'Testing' in the context of text mining involves partially automated methods for finding evidence in the literature to support hypothetical relationships. Concluding remarks follow regarding (a) the limits of current strategies for evaluation of hypothesis discovery systems and (b) the role of literature-based discovery in concert with empirical research. Report of an informatics-driven literature review for biomarkers of systemic lupus erythematosus is mentioned. Swanson's vision of the hidden value in the literature of science and, by extension, in biomedical digital databases, is still remarkably generative for information scientists, biologists, and physicians.

Highlights

  • When biomedical researchers pose reference questions in the context of conceptual biology, librarians and information specialists may be puzzled

  • Innovative information professionals with requisite skills and motivation can add value to the usual array of services by expanding their roles as expert searchers. They need to know about profound changes in biology and parallel trends in text mining – a kind of computerized data mining to search for meaningful patterns of text, such as strings of nucleotides or clinical concepts in databases annotated by expert humans

  • If no other criteria for demonstrating validity exist, evaluation must await tests by empiricists who happen to find the results interesting [9]. This is a major problem for developers of hypothesis generating systems

Read more

Summary

Introduction

When biomedical researchers pose reference questions in the context of conceptual biology, librarians and information specialists may be puzzled. Their results gave credence to Swanson's strategy by confirming the link between Raynaud's syndrome and dietary fish oil They introduced lexical and statistical methods for mining abstracts instead of titles and developed computer-based tools to support discovery. Weeber et al Weeber and colleagues [33] developed a concept-based, Natural Language Processing system called DAD (DrugAdverse Drug Reaction-Disease) to assist biomedical experts in formulating and testing hypotheses, primarily for drug discovery studies They bypassed the difficulties of extracting words – obviating the need for stop lists and complex queries for synonyms and variants – by mapping words in titles and abstracts to concepts in the Unified Medical Language System (UMLS) Metathesaurus, one of three components in the National Library of Medicine's UMLS [34]. Unlike Stegmann and Grohmann [30], the results are ranked term lists rather than clusters

Conclusion
Bray D
18. Srinivasan P
25. Swanson DR
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.