Development and tuning of an original search engine for patent libraries in medicinal chemistry

Emilie Pasche,Julien Gobeill,Fatma Oezdemir-Zaech,Therese Vachon,Olivier Kreim,Christian Lovis,Patrick Ruch

doi:10.1186/1471-2105-15-s1-s15

Emilie Pasche, Julien Gobeill + Show 5 more

Open Access

https://doi.org/10.1186/1471-2105-15-s1-s15

Copy DOI

Abstract

BackgroundThe large increase in the size of patent collections has led to the need of efficient search strategies. But the development of advanced text-mining applications dedicated to patents of the biomedical field remains rare, in particular to address the needs of the pharmaceutical & biotech industry, which intensively uses patent libraries for competitive intelligence and drug development.MethodsWe describe here the development of an advanced retrieval engine to search information in patent collections in the field of medicinal chemistry. We investigate and combine different strategies and evaluate their respective impact on the performance of the search engine applied to various search tasks, which covers the putatively most frequent search behaviours of intellectual property officers in medical chemistry: 1) a prior art search task; 2) a technical survey task; and 3) a variant of the technical survey task, sometimes called known-item search task, where a single patent is targeted.ResultsThe optimal tuning of our engine resulted in a top-precision of 6.76% for the prior art search task, 23.28% for the technical survey task and 46.02% for the variant of the technical survey task. We observed that co-citation boosting was an appropriate strategy to improve prior art search tasks, while IPC classification of queries was improving retrieval effectiveness for technical survey tasks. Surprisingly, the use of the full body of the patent was always detrimental for search effectiveness. It was also observed that normalizing biomedical entities using curated dictionaries had simply no impact on the search tasks we evaluate. The search engine was finally implemented as a web-application within Novartis Pharma. The application is briefly described in the report.ConclusionsWe have presented the development of a search engine dedicated to patent search, based on state of the art methods applied to patent corpora. We have shown that a proper tuning of the system to adapt to the various search tasks clearly increases the effectiveness of the system. We conclude that different search tasks demand different information retrieval engines' settings in order to yield optimal end-user retrieval.

Highlights

The large increase in the size of patent collections has led to the need of efficient search strategies
One of the most popular competitions to evaluate and compare search engines, the Text REtrieval Conferences (TREC) [6], has lately set up an information retrieval track dedicated to patent search for chemistry, called TREC-Chem [7]
The delivered search engine has been implemented as a web application that we briefly describe at the end of the paper

Summary

Introduction

The large increase in the size of patent collections has led to the need of efficient search strategies. The PA task aims to determine how systems can help recovering the prior art of a given patent. For this task, queries are full-text patents. With relatively short queries (i.e. typically a few sentences), the systems must retrieve a set of relevant patents that fulfil a particular information need. In this context, a collection of about 1.3 million patents is provided to participants, as well as queries for both tasks. Relevance judgments are defined after submission of the runs The participants of such competitions have explored various strategies

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Jan 1, 2014
Citations: 33	License type: cc-by

R Discovery Prime

R Discovery Prime

Development and tuning of an original search engine for patent libraries in medicinal chemistry

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

Development of a text search engine for medicinal chemistry patents
Emilie Pasche ... Fatma Oezdemir-Zaech
EMBnet.journal | VOL. 18
Emilie Pasche, et. al.Emilie Pasche ... Fatma Oezdemir-Zaech
09 Nov 2012
EMBnet.journal | VOL. 18

BIOMedical Search Engine Framework: Lightweight and customized implementation of domain-specific biomedical search engines
Alberto G Jácome ... Anália Lourenço
Computer Methods and Programs in Biomedicine | VOL. 131
Alberto G Jácome, et. al.Alberto G Jácome ... Anália Lourenço
08 Apr 2016
Computer Methods and Programs in Biomedicine | VOL. 131

Dynamic Taxonomies and Intelligent User-Centric Access to Complex Portal Information
Giovanni M Sacco
-
Giovanni M SaccoGiovanni M Sacco
01 Jan 2007
01 Jan 2007

A Context-Aware Enterprise Search Engine for Aviation
Hao Wang ... Tangjian Deng
-
Hao Wang, et. al.Hao Wang ... Tangjian Deng
01 Jan 2012
01 Jan 2012

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Development and tuning of an original search engine for patent libraries in medicinal chemistry

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics