Patent Text Classification based on Deep Learning and Vocabulary Network

Ran Li,Wangke Yu,Yuying Liu,Qianliang Huang

doi:10.14569/ijacsa.2023.0140107

Abstract

Patent documents are a special long text format, and traditional deep learning methods have insufficient feature extraction ability, which results in a weaker classification effect than ordinary text. Based on this, this paper constructs a text feature extraction method based on the lexical network, according to the inner relation between words and classification. Firstly, the inner relationship between words and classification was obtained from linear and probability dimensions and the lexical network were constructed. Secondly, the lexical network is fused with the features extracted from the deep learning model. Finally, the fusion features are trained in the original model to get the final classification result. T This method is a classification enhancement method that can classify patent text alone or enhance the accuracy of various types of neural networks in patent text classification. Experimental results demonstrate that the accuracy of BERT combined with lexical network method is as high as 82.73%, and the accuracy of lexical network method combined with CNN and LSTM is increased by 2.19% and 2.25% respectively. In addition, it was demonstrated that the lexical network feature extraction method accelerated the convergence speed of the model during training and improved the classification ability of the model in Chinese patent texts.

Full Text