Abstract

Paraphrase Identification(PI) aims to judge whether the semantics of two sentences are the same, and the semantic representation of sentences is the basis of paraphrase identification. Most of the existing paraphrase identification methods use pre-trained word vectors as model input. This pre-trained word vector ignores the common polysemy and synonyms in Chinese. In order to improve the accuracy of paraphrase identification on the datasets LCQMC and BQ, we use sememe knowledge in HowNet is used to enhance the semantic representation effect of polysemous words and synonyms in paraphrase identification, thereby improving the accuracy index of paraphrase identification on the dataset. In the text representation stage, two sememe knowledge representation methods are used to enhance the semantic representation effect in the paraphrase identification task. In the semantic distance measurement stage, Euclidean distance, Manhattan distance and cosine distance are used to measure the semantic relationship between sentences. The model we proposed has been verified on two Chinese datasets LCQMC and BQ, and the ablation experiment and experimental analysis of the representation of sememe knowledge have been carried out. The results show that the accuracy of the model is about 1% higher than that of the typical model. At the same time, the introduction of sememe Knowledge can enhance the semantic representation of polysemous words and synonyms, thereby improving the accuracy of paraphrase recognition tasks.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call