Comparison of Pre-trained vs Custom-trained Word Embedding Models for Word Sense Disambiguation

Muhammad Farhat Ullah,Naveed Hussain,Ali Saeed

doi:10.14201/adcaij.31084

Abstract

The prime objective of word sense disambiguation (WSD) is to develop such machines that can automatically recognize the actual meaning (sense) of ambiguous words in a sentence. WSD can improve various NLP and HCI challenges. Researchers explored a wide variety of methods to resolve this issue of sense ambiguity. However, majorly, their focus was on English and some other well-reputed languages. Urdu with more than 300 million users and a large amount of electronic text available on the web is still unexplored. In recent years, for a variety of Natural Language Processing tasks, word embedding methods have proven extremely successful. This study evaluates, compares, and applies a variety of word embedding approaches to Urdu Word embedding (both Lexical Sample and All-Words), including pre-trained (Word2Vec, Glove, and FastText) as well as custom-trained (Word2Vec, Glove, and FastText trained on the Ur-Mono corpus). Two benchmark corpora are used for the evaluation in this study: (1) the UAW-WSD-18 corpus and (2) the ULS-WSD-18 corpus. For Urdu All-Words WSD tasks, top results have been achieved (Accuracy=60.07 and F1=0.45) using pre-trained FastText. For the Lexical Sample, WSD has been achieved (Accuracy=70.93 and F1=0.60) using custom-trained GloVe word embedding method.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Comparison of Pre-trained vs Custom-trained Word Embedding Models for Word Sense Disambiguation

Abstract

Talk to us

Similar Papers

More From: ADCAIJ: Advances in Distributed Computing and Artificial Intelligence Journal

Lead the way for us

Journal: ADCAIJ: Advances in Distributed Computing and Artificial Intelligence Journal	Publication Date: Nov 1, 2023
License type: CC BY-NC-ND 4.0

Similar Papers

CluBERT: A Cluster-Based Approach for Learning Sense Distributions in Multiple Languages
Tommaso Pasini ... Bianca Scarlini
-
Tommaso Pasini, et. al.Tommaso Pasini ... Bianca Scarlini
01 Jan 2020
01 Jan 2020

Automatic Labelling of Genre-Specific Collections for Word Sense Disambiguation in Russian
Angelina Bolshina ... Natalia Loukachevitch
-
Angelina Bolshina, et. al.Angelina Bolshina ... Natalia Loukachevitch
01 Jan 2020
01 Jan 2020

Word sense disambiguation for statistical machine translation
Marine Jacinthe Carpuat
-
Marine Jacinthe CarpuatMarine Jacinthe Carpuat
23 Dec 2014
23 Dec 2014

Word Sense Disambiguation Using Embedded Word Space
Myung Yun Kang ... Jae Sung Lee
Journal of Computing Science and Engineering | VOL. 11
Myung Yun Kang, et. al.Myung Yun Kang ... Jae Sung Lee
30 Mar 2017
Journal of Computing Science and Engineering | VOL. 11

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Comparison of Pre-trained vs Custom-trained Word Embedding Models for Word Sense Disambiguation

Abstract

Talk to us

Similar Papers

More From: ADCAIJ: Advances in Distributed Computing and Artificial Intelligence Journal