Abstract

Various studies have focused on feature extraction methods for automatic patent classification in recent years. However, most of these approaches are based on the knowledge from experts in related domains. Here we propose a hierarchical feature extraction model (HFEM) for multi-label mechanical patent classification, which is able to capture both local features of phrases as well as global and temporal semantics. First, a n-gram feature extractor based on convolutional neural networks (CNNs) is designed to extract salient local lexical-level features. Next, a long dependency feature extraction model based on the bidirectional long–short-term memory (BiLSTM) neural network model is proposed to capture sequential correlations from higher-level sequence representations. Then the HFEM algorithm and its hierarchical feature extraction architecture are detailed. We establish the training, validation and test datasets, containing 72,532, 18,133, and 2679 mechanical patent documents, respectively, and then check the performance of HFEMs. Finally, we compared the results of the proposed HFEM and three other single neural network models, namely CNN, long–short-term memory (LSTM), and BiLSTM. The experimental results indicate that our proposed HFEM outperforms the other compared models in both precision and recall.

Highlights

  • The World Intellectual Property Organization (WIPO) developed the International Patent Classification (IPC) as a standard taxonomy to classify patents and their applications

  • When an enormous number of patent applications come to the local patent office, it could be a nightmare for the patent examiners

  • This paper presents a hybrid hierarchical feature extraction model (HFEM) for multi-label mechanical patent classification

Read more

Summary

Introduction

The World Intellectual Property Organization (WIPO) developed the International Patent Classification (IPC) as a standard taxonomy to classify patents and their applications. Some researchers tried to identify which parts in a patent document can provide more representative information for classification tasks [5,13] Almost all these studies highly rely on hand-crafted feature engineering, researchers have to design sophisticated feature extractors to extract features from patent documents to achieve competitive performance in the PAC system. Convolutional neural networks (CNN) can capture salient local lexical-level features and bidirectional long–short-term memory (BiLSTM) can learn long-term dependencies from sequences of higher-level representations in the patent text [16,17]. This paper presents a hybrid hierarchical feature extraction model (HFEM) for multi-label mechanical patent classification. Our algorithm adopts CNN and BiLSTM to capture local lexical-level and long dependency sentence-level features, and uses a supervised feature learning scheme to automatically extract features from patent documents without any expert knowledge.

Feature Extraction from Text
Patent Classification
Deep Learning in Text Feature Extraction
The Architecture of the Hierarchical Feature Extraction Model and Algorithm
Datasets and Evaluation Metrics
Title Abstract
Performance Analysis of HFEM with Different Concatenation Strategies
Comparison and Analysis with Other Methods
Experimental Setup
Findings
PrecisionTop 1
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call