Recently, the booming growth of patent applications has brought an unprecedented challenge in performing efficient intellectual property management. Therefore, intelligent approaches are urgently needed to analyze intrinsic patterns of patents. However, a long-standing obstacle is the lack of effective methods for modeling the dynamic and diverse examination process of patent applications, which can benefit a wide range of downstream tasks for patent management. In fact, the major challenges lie in how to discover and integrate domain-specific properties from large-scale unlabeled examination data. To this end, in this paper, we propose a Self-supervised Examination Process Modeling (SEPM) framework to learn the contextualized embedding for patents through modeling their examination processes. Specifically, we first design a multi-aspect event embedding layer, which leverages the fine-tuned language model, frequent-pattern embedding, and time encoding to capture the semantic, frequent-pattern and temporal information of examination events, respectively. Then, a mutual-information-aware integration layer is applied to fuse the extracted features into multi-aspect embedding considering their mutual interactions. Further, we develop a multi-objective sequential neural network for learning the contextualized patent representation, which is achieved through jointly learning two self-supervised objectives, namely event code and event lag auto-regression. To explore the application potential of SEPM, we fine-tune the well-trained model for three important downstream tasks of patent management, including the prediction of next events, patent classification, and grant prediction. In the end, extensive experiments with real-world data from the US Patent and Trademark Office verify the effectiveness and application prospects of the proposed framework.
Read full abstract