Patent Phrase to Phrase Matching Based on Bert

Zhan Chen

doi:10.54691/bcpbm.v38i.3832

Abstract

Due to a large US patent archive, it is necessary to introduce a similarity matching system to judge if an invention has been granted a patent so that people just focus on high similarity patent items and ignore low similarity ones. First, the large-scale corpus is pre-trained using the Bert language model to acquire the semantic characteristics of general language. The pre-training Bert language model is used to tune the text data set of patent phrases to acquire the semantic features of the certain text and the specific meaning of the keywords to match similarity, given certain parameters according to the task, such as MSE as loss function and certain number as learning rate and so on. The validation results are good whether it is according to MSE loss or the Pearson correlation coefficient. Finally, applying this model to the test dataset and the results show that the Pearson correlation of all the variables is significant, and the model fits well.

Full Text