The semantic text similarity (STS) estimation between patents is a critical issue for the patent portfolio analysis. Current methods such as keywords, co-word analysis and even the Subject-Action-Object (SAO) algorithms, are not quite reasonable for the patent similarity calculation due to the lack of fine-grained semantic knowledge, “property-parameter” features and flexible “functional or non-functional” combinations. In the meanwhile, standardized similarity datasets are also unavailable. In this paper, we have proposed a new kind of functional semantic knowledge (Function-Object-Property, i.e., FOP) instead of SAO triples, which can contribute directly to enhance the patent similarity. Moreover, patent STS datasets, including the matching dataset and the ranking dataset, have firstly been processed and released as benchmarks for the comparative evaluation. Preliminary results have demonstrated that FOP-based methods are more appropriate in the STS tasks incorporated with IPC codes, weights’ assignments and patent pre-trained vectors. To be further, the deep interaction-based models with the averaged FOP embeddings are recommended to be one of the most optimal choices of effectively improving the semantic learning capability. Finally, a new patent similarity calculation framework is summarized and successfully applied in the patent retrieval, which highlight that the proposed methodology serves as a dominant power in diverse patented STS tasks.
Read full abstract