SEA-PS: Semantic embedding with attention to measuring patent similarity by leveraging various text fields

Zihong Wang,Yufei Liu

doi:10.1177/01655515221106651

Abstract

Similarity metrics are critical to identifying the relationships between patents. While many bibliometric methods such as co-citation and co-classification fail to use the vast majority of technical information existing in the text, most text mining methods focus on keywords in only one text field of the patent document. This article aims to leverage various text fields to measure pairwise patent similarity according to their technological bases. A novel approach called semantic embedding with attention for patent similarity (SEA-PS) is proposed. First, the method identifies technological bases and models the semantic relatedness. To achieve this, we put forward an additional patent stop-word list to help extract technical terms with an n-gram-based statistical method. The technical terms are then mapped into a vector space using word embedding. Second, we propose a graph-based method to allocate weights to distinguish the technical focus, considering the linkages between technologies. Finally, we assess the feasibility of the text fields, and integrate their semantics at the patent-level with an attention layer to conduct similarity metrics. The validations are from two perspectives: content validity (coverage of technical information, the validity of semantic representations and effectiveness of text field combinations), and external validity against existing methods via an expert panel. The results demonstrate the superiority of SEA-PS to existing methods, and suggest that ‘abstracts’, ‘claims’ and ‘technical descriptions’ are more effective than ‘titles’. SEA-PS is a fundamental tool for patent retrieval and classification. It also has a broad range of practical applications in innovation and strategy studies, including identifying technological frontiers and studying knowledge spillovers.

Full Text