Aiming at the problems of traditional text abstract extraction algorithms for processing long text of Chinese patents and unsatisfactory results of long abstract generation, the PatBertSum algorithm is proposed, which enables the algorithm to process long (more than 1500 words) patent text with high efficiency and generate high-quality long (more than 200 words) text summaries. The method is based on the improved BertSum algorithm model, using the new CLTPDS patented text dataset, processing long texts by Head-Tail, transforming Chinese input representations, generating sentence vectors using a pre-trained model, and capturing internal text features and text structure features to extract summaries. Experimentally, this paper demonstrates that the method has improved the recall and F-value of ROUGE by more than 8 percentage points compared with existing methods.
Read full abstract