The Impact of Sentence Embeddings in Turkish Paraphrase Detection

Bahar Karaoğlan,Senem Kumova Metın,Tarık Kişla,Hakkı Engin Yorgancioğlu

doi:10.1109/siu.2019.8806506

Bahar Karaoğlan, Senem Kumova Metın + Show 2 more

https://doi.org/10.1109/siu.2019.8806506

Copy DOI

Export

Save

Cite

Publication Date: Apr 1, 2019

Abstract
Full-Text
Similar Papers

Abstract

Listen

In recent studies, it is shown that word embeddings achieve in several natural language processing (NLP) tasks. Though paraphrase identification in Turkish is well-studied by traditional statistical NLP methods, to the best of our knowledge there exists no study where word and/or sentence embeddings are employed. In this paper, three methods, which are well-known as “using average vector for word embeddings” (AWE), “concatenated vectors for word embeddings” (CWE) and “word mover's distance word embeddings” (WMDWE) to build sentence embeddings from word embeddings are examined and their effect in performance of paraphrase identification is measured. The results are presented comparatively for English (MSRP) and Turkish (PARDER and TuPC) paraphrase corpora. The study doesn't cover the optimization of parameters used in training of word embeddings and also the features specific to Turkish langauge are not considered. Despite this naive approach, the test results obtained from PARDER corpus are inspiring that a more detailed study that involves such improvements may result with more convincing performance values.

Full Text