GCN-based document representation for keyphrase generation enhanced by maximizing mutual information

Peng Yang,Yanyan Ge,Yu Yao,Ying Yang

doi:10.1016/j.knosys.2022.108488

Abstract

Keyphrase generation is an important fundamental task of natural language processing, which can help users quickly obtain valuable information from a large number of documents especially when they are facing with informal social media text. Existing Recurrent Neural Network (RNN) based keyphrase generation approaches cannot properly model the dependency structure of the informal text, which is often implicit between those distant words and plays an important role in extracting salient information. To obtain core features of text, we apply Graph Convolutional Network (GCN) on document-level graph to capture dependency structure information. The GCN-based node representations are further fed into a predictor network to provide potential candidates for copying mechanism. Moreover, we utilize a novel variational selector network to determine the final selection probability of each word in a phrase, which relies on its probabilities of copying from a given document and being generated from a vocabulary. Eventually, we introduce an enhancement mechanism to maximize the mutual information between document and generated keyphrase, thus ensuring the consistency between them. Experiment results show that our model outperforms previous state-of-the-art baselines on three social datasets, including Weibo, Twitter and StackExchange.

Full Text