Paraphrase Identification Based on Weighted URAE, Unit Similarity and Context Correlation Feature

Jie Zhou,Huanrong Sun,Gongshen Liu

doi:10.1007/978-3-319-99501-4_4

Abstract

A deep learning model adaptive to both sentence-level and article-level paraphrase identification is proposed in this paper. It consists of pairwise unit similarity feature and semantic context correlation feature. In this model, sentences are represented by word and phrase embedding while articles are represented by sentence embedding. Those phrase and sentence embedding are learned from parse trees through Weighted Unfolding Recursive Autoencoders (WURAE), an unsupervised learning algorithm. Then, unit similarity matrix is calculated by matching the pairwise lists of embedding. It is used to extract the pairwise unit similarity feature through CNN and k-max pooling layers. In addition, semantic context correlation feature is taken into account, which is captured by the combination of CNN and LSTM. CNN layers learn collocation information between adjacent units while LSTM extracts the long-term dependency feature of the text based on the output of CNN. This model is experimented on a famous English sentence paraphrase corpus, MSRPC, and a Chinese article paraphrase corpus. The results show that the deep semantic feature of text could be extracted based on WURAE, unit similarity and context correlation feature. We release our code of WURAE, deep learning model for paraphrase identification and pre-trained phrase end sentence embedding data for use by the community.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Paraphrase Identification Based on Weighted URAE, Unit Similarity and Context Correlation Feature

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Modeling Paraphrase Identification Using Supervised Learning Methods Against Various Datasets and Features
Rutal S Mahajan ... Mukesh A Zaveri
-
Rutal S Mahajan, et. al.Rutal S Mahajan ... Mukesh A Zaveri
01 Dec 2017
01 Dec 2017

Constructing a Turkish Corpus for Paraphrase Identification and Semantic Similarity
Asli Eyecioglu ... Bill Keller
-
Asli Eyecioglu, et. al.Asli Eyecioglu ... Bill Keller
01 Jan 2018
01 Jan 2018

Paraphrase identification and semantic text similarity analysis in Arabic news tweets using lexical, syntactic, and semantic features
Mohammad Al-Smadi ... Yaser Jararweh
Information Processing & Management | VOL. 53
Mohammad Al-Smadi, et. al.Mohammad Al-Smadi ... Yaser Jararweh
30 Jan 2017
Information Processing & Management | VOL. 53

Low-Level Features for Paraphrase Identification
Ekaterina Pronoza ... Elena Yagunova
-
Ekaterina Pronoza, et. al.Ekaterina Pronoza ... Elena Yagunova
01 Jan 2015
01 Jan 2015

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Paraphrase Identification Based on Weighted URAE, Unit Similarity and Context Correlation Feature

Abstract

Talk to us

Similar Papers