Open Relation Extraction in Patent Claims with a Hybrid Network

Boting Geng,Wenqing Wu

doi:10.1155/2021/5547281

Abstract

Research on relation extraction from patent documents, a high-priority topic of natural language process in recent years, is of great significance to a series of patent downstream applications, such as patent content mining, patent retrieval, and patent knowledge base constructions. Due to lengthy sentences, crossdomain technical terms, and complex structure of patent claims, it is extremely difficult to extract open triples with traditional methods of Natural Language Processing (NLP) parsers. In this paper, we propose an Open Relation Extraction (ORE) approach with transforming relation extraction problem into sequence labeling problem in patent claims, which extract none predefined relationship triples from patent claims with a hybrid neural network architecture based on multihead attention mechanism. The hybrid neural network framework combined with Bi-LSTM and CNN is proposed to extract argument phrase features and relation phrase features simultaneously. The Bi-LSTM network gains long distance dependency features, and the CNN obtains local content feature; then, multihead attention mechanism is applied to get potential dependency relationship for time series of RNN model; the result of neural network proposed above applied to our constructed open patent relation dataset shows that our method outperforms both traditional classification algorithms of machine learning and the-state-of-art neural network classification models in the measures of Precision, Recall, and F1.

Highlights

With the development of economy, patent documents, being an extremely important knowledge carrier, record a large number of valuable inventions, creative ideas, and excellent design concepts
Extracting none predefined relation triples from patent claims, which contains a series of rights granted by a government for a given limited period, is a vital basic research application for some upper level applications of patent document analysis, such as patent information retrieval [1, 2], patent classification [3], patent categorization [4], and patent knowledge graph construction [5]
We propose an open relation extraction model of hybrid neural network to extract relation triples from patent claims, where Bi-long short-term memory (LSTM) network can obtain temporal information from the whole sentence, and Convolutional neural networks (CNN) pooling can gain local content information; at the same time, multihead attention is incorporated into extracting content dependency feature in order to better serve for sequence label classification problems

Summary

Introduction

With the development of economy, patent documents, being an extremely important knowledge carrier, record a large number of valuable inventions, creative ideas, and excellent design concepts. We propose an open relation extraction model of hybrid neural network to extract relation triples from patent claims, where Bi-LSTM network can obtain temporal information from the whole sentence, and CNN pooling can gain local content information; at the same time, multihead attention is incorporated into extracting content dependency feature in order to better serve for sequence label classification problems. (1) A hybrid neural network (Bi-LSTM+CNN+CRF) of open relation extraction (ORE) model is firstly proposed to extract none predefined triples from patent document (2) Multihead attention technique serves for better sequence label dependency classification (3) We constructed an open patent relation corpus in favor of adopting supervised approaches to ORE task in patent analysis, including 1309 annotated claims with about 29850 sentences. A variety of experiments help readers to better understand reliability of our hybrid model

Related Work

Our Hybrid ORE Neural Framework

Experiments