A non-factoid question answering system for prior art search

Morteza Zihayat,Rochelle Etwaroo

doi:10.1016/j.eswa.2021.114910

Abstract

A patent gives the owner of an invention the exclusive rights to make, use and sell their invention. Before a new patent application is filed, patent lawyers are required to engage in Prior Art Search to determine the likelihood that an invention is novel, valid or to make sense of the domain. To perform this search, existing platforms utilize keywords and Boolean Logic, which disregards the syntax and semantics of natural language and thus, making the search extremely difficult. Consequently, studies regarding semantics using neural embeddings exist, but these only consider a narrow number of unidirectional words. In this study, we propose an end-to-end framework to consider bidirectional semantics, syntax and the thematic nature of natural language for prior art search. The proposed framework goes beyond keywords as input queries and takes a patent as the input. The contributions of this paper is twofold; adapting pre-trained embedding models (e.g., BERT) to address the semantics and syntax of language, followed by the second component, which exploits topic modeling to build a diversified answer that covers all themes across domains of the input patent. We evaluate the performance of the proposed framework on the CLEF-IP 2011 benchmark dataset and a real-world dataset obtained from Google patent repository and show that the proposed framework outperforms existing methods and returns meaningful results for a given patent.

Full Text