Building Sense Tagged Corpus Using Wikipedia for Supervised Word Sense Disambiguation

Abdulgabbar Saif,Nazlia Omar,Ummi Zakiah Zainodin,Mohd Juziaddin Ab Aziz

doi:10.1016/j.procs.2018.01.062

Abdulgabbar Saif, Nazlia Omar + Show 2 more

Open Access

https://doi.org/10.1016/j.procs.2018.01.062

Copy DOI

Abstract

Building of sense-tagged data is a main challenge for supervised techniques that achieved promising results in word sense disambiguation. The manual building of sense-tagged data is a labor and a time-consuming task because each ambiguous word has to be labeled in collected contexts by linguistic experts. Therefore, this paper proposes a knowledge-based method for building the Arabic sense-tagged corpus from Wikipedia. The method starts with mapping Arabic WordNet and Wikipedia to select the Wikipedia article for the corresponding sense in WordNet. In this mapping step, the cross-lingual method is used to measure the similarity between features of a Wikipedia article and a WordNet sense separately. Then, the incoming-links of Wikipedia articles are exploited to extract instances for the sense of each ambiguous word in WordNet. For handling the lack of instances of some articles in Wikipedia, the multiword-based technique is proposed to increase a number of instances for each concept. Experimental results show that the cross-lingual method outperforms monolingual method that is based on Arabic features only. The sense-tagged corpus is created for 50 ambiguous words yielding 148 senses with 30,961 instances.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Procedia Computer Science	Publication Date: Jan 1, 2018
Citations: 7	License type: cc-by-nc-nd

R Discovery Prime

R Discovery Prime

Building Sense Tagged Corpus Using Wikipedia for Supervised Word Sense Disambiguation

Abstract

Talk to us

Similar Papers

More From: Procedia Computer Science

Lead the way for us

Similar Papers

Word Sense Disambiguation Method Based on Improved Mutual Information with Wikipedia Extend
Feiyue Ye ... Yulong Zhu
-
Feiyue Ye, et. al.Feiyue Ye ... Yulong Zhu
01 Oct 2015
01 Oct 2015

Optimal distance metric function with trigram features for case based word sense disambiguation using artificial neural network
P Tamilselvi ... S K Srivatsa
-
P Tamilselvi, et. al.P Tamilselvi ... S K Srivatsa
01 Dec 2011
01 Dec 2011

Fuzzy Word Sense Induction and Disambiguation
Parham Kazemi ... Hossein Karshenas
IEEE Transactions on Fuzzy Systems | VOL. 30
Parham Kazemi, et. al.Parham Kazemi ... Hossein Karshenas
01 Sep 2022
IEEE Transactions on Fuzzy Systems | VOL. 30

Evaluating n-gram Models for a Bilingual Word Sense Disambiguation Task
...
-
, et. al. ...
31 Dec 2011
31 Dec 2011

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Building Sense Tagged Corpus Using Wikipedia for Supervised Word Sense Disambiguation

Abstract

Talk to us

Similar Papers

More From: Procedia Computer Science