Text segmentation for patent claim simplification via Bidirectional Long‐Short Term Memory and Conditional Random Field

Boting Geng

doi:10.1111/coin.12455

Boting Geng

https://doi.org/10.1111/coin.12455

Copy DOI

Export

Save

Cite

Journal: Computational Intelligence	Publication Date: May 14, 2021
Citations: 15

Affiliation: Zhejiang University

Abstract
Full-Text
Similar Papers

Abstract

Listen

AbstractText simplification is a vital work for comprehending patent claims due to its complex syntactic structures and lengthy sentences. Therefore, almost all patent analysis practitioners cannot be able to directly and intuitively understand patent essence even through some common natural language processing (NLP) tools are applied to parse these patent claim paragraph or sentences. Universal text analysis tools above is almost useless, or even crashed when applied to some complex paragraphs of patent claims. Therefore, it is necessary to propose a patent text oriented simplification approach to help patent researchers grasp the essence of patent quickly and intuitively. Motivated by the above reason, we in this article propose a simplification method based on deep learning to segment patent claim into shorter and comprehensible sentences for downstream tasks of patent analysis. The proposed approach contains two stages: on one stage, we use a machine learning approach of conditional random field (CRF) to decompose syntactically complex paragraphs into coarse‐grained level sentences with simplified structures and complete semantics; on another stage, a deep Learning architecture of bidirectional long‐short term memory (Bi‐LSTM)‐CRF is applied to segment coarse‐grained and lengthy sentences of former stage into fined‐grained and shorter sentences. Compared with a series of baselines, our patent segmentation architecture based on deep learning of Bi‐LSTM‐CRF achieves higher performance than any other methods on the evaluation measures of precision, recall, and F1.

Full Text