Abstract

IBM Watson for Drug Discovery (WDD) is a cognitive computing software platform for early stage pharmaceutical research. WDD extracts and cross-references life sciences information from very large-scale structured and unstructured data, identifying connections and correlations in an unbiased manner, and enabling more informed decision making through explainable analytics and scientific visualizations. This paper describes in detail the high-throughput natural language processing system implemented in WDD. This system enables a new WDD release every three weeks, comprising the latest publications as part of a continually growing corpus of over 30 million scientific and intellectual property documents, each reprocessed using the latest annotators and structured reference data to extract a set of domain-relevant entity and relationship concepts. The hybrid approach to natural language processing in WDD incorporates model- and rule-based techniques utilized in concert for high-performance named entity recognition, and a similar ensemble approach to named entity resolution tasks, culminating in semantic relationship extraction. Statistics on full-scale annotation results and example use cases are also provided.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call