LION LBD: a literature-based discovery system for cancer biology.

Sampo Pyysalo,Ulla Stenius,Stefan Haselwimmer,Imran Ali,Tejas Shah,Andrew Young,Johan Högberg,Masashi Narita,Anna Korhonen,Simon Baker,Yufan Guo,Russell Schwartz

doi:10.1093/bioinformatics/bty845

Abstract

MotivationThe overwhelming size and rapid growth of the biomedical literature make it impossible for scientists to read all studies related to their work, potentially leading to missed connections and wasted time and resources. Literature-based discovery (LBD) aims to alleviate these issues by identifying implicit links between disjoint parts of the literature. While LBD has been studied in depth since its introduction three decades ago, there has been limited work making use of recent advances in biomedical text processing methods in LBD.ResultsWe present LION LBD, a literature-based discovery system that enables researchers to navigate published information and supports hypothesis generation and testing. The system is built with a particular focus on the molecular biology of cancer using state-of-the-art machine learning and natural language processing methods, including named entity recognition and grounding to domain ontologies covering a wide range of entity types and a novel approach to detecting references to the hallmarks of cancer in text. LION LBD implements a broad selection of co-occurrence based metrics for analyzing the strength of entity associations, and its design allows real-time search to discover indirect associations between entities in a database of tens of millions of publications while preserving the ability of users to explore each mention in its original context in the literature. Evaluations of the system demonstrate its ability to identify undiscovered links and rank relevant concepts highly among potential connections.Availability and implementationThe LION LBD system is available via a web-based user interface and a programmable API, and all components of the system are made available under open licenses from the project home page http://lbd.lionproject.net.Supplementary information Supplementary data are available at Bioinformatics online.

Highlights

The enormous size and exponential growth of the scientific literature make it increasingly difficult for researchers to stay up to date on all developments in their field, let alone on those in related areas of study (Simpson and Demner-Fushman, 2012)
The system is built with a particular focus on the molecular biology of cancer using state-of-the-art machine learning and natural language processing methods, including named entity recognition and grounding to domain ontologies covering a wide range of entity types and a novel approach to detecting references to the hallmarks of cancer in text
Availability and implementation: The LION Literature-based discovery (LBD) system is available via a web-based user interface and a programmable Application Programming Interface (API), and all components of the system are made available under open licenses from the project home page http://lbd.lionproject.net

Summary

Introduction

The enormous size and exponential growth of the scientific literature make it increasingly difficult for researchers to stay up to date on all developments in their field, let alone on those in related areas of study (Simpson and Demner-Fushman, 2012). This issue is challenging in complex and tightly interconnected areas of biomedical research such as cancer, which is addressed in millions of existing publications.

Methods

Results

Conclusion