A Study of Training Approaches of a Hybrid Summarisation Model Applied to Patent Dataset

Cinthia M Souza,Magali R G Meireles,Leonardo A Souza Filho,Daniel S Bastos

doi:10.1142/s0219649223500302

Abstract

Patents are recognised as an important source of scientific knowledge. The automatic summarisation process of patents can assist in the organisation, and, consequently, the access to the contents of patent databases. The main contribution of this work is to carry out a study of training approaches of a hybrid summarisation model to create concise, single sentence summaries for patent documents. The experiments were executed using a dataset containing more than 80,000 patents, made available by the United States Patent and Trademark Office. Comparative experiments between the selected model and seven state-of-the-art models in extractive, abstractive and hybrid text summarisation (HTS) were performed. The results obtained showed that the selected approach produces better results than extractive and HTS models, and yields good prospects in extremely concise summaries. It is concluded that the study of different training approaches, coupled with the analysis of the attention words weights in the final results, is an important step in this process, impacting directly the choice of the final summarisation model. Besides this, the results of the experiments suggest that the removal of stop words from the input text did not generate better results, although the attention words extracted with the model without stop words were, in general, better.

Full Text