Abstract
Paraphrase generation is an essential yet challenging task in natural language processing. Neural-network-based approaches towards paraphrase generation have achieved remarkable success in recent years. Previous neural paraphrase generation approaches ignore linguistic knowledge, such as part-of-speech information regardless of its availability. The underlying assumption is that neural nets could learn such information implicitly when given sufficient data. However, it would be difficult for neural nets to learn such information properly when data are scarce. In this work, we endeavor to probe into the efficacy of explicit part-of-speech information for the task of paraphrase generation in low-resource scenarios. To this end, we devise three mechanisms to fuse part-of-speech information under the framework of sequence-to-sequence learning. We demonstrate the utility of part-of-speech information in low-resource paraphrase generation through extensive experiments on multiple datasets of varying sizes and genres.
Highlights
Polysemy poses a great challenge for natural language processing
In neural-network-based natural language processing models, input words are often represented as word vectors. e most commonly used off-the-shelf word vectors include word2vec [1] and GloVe [2], both of which are trained on large corpora via unsupervised learning algorithms
For a word with multiple senses, its word vector is a weighted average of the representations for different senses, with the weights depending on how often the corresponding senses occur in the training corpus
Summary
Polysemy poses a great challenge for natural language processing. In neural-network-based natural language processing models, input words are often represented as word vectors (or word embeddings). e most commonly used off-the-shelf word vectors include word2vec [1] and GloVe [2], both of which are trained on large corpora via unsupervised learning algorithms. E most commonly used off-the-shelf word vectors include word2vec [1] and GloVe [2], both of which are trained on large corpora via unsupervised learning algorithms. Both embedding models present only one corresponding word vector as a dense (distributed) semantic representation for each individual word. An obvious drawback of this hybrid representation is that the most common sense vector representation is still used when a word is encountered in an NLP task with less common usage It can cause significant difficulties in the learning of NLP models, including paraphrase generation
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have