A Link Prediction Approach for Accurately Mapping a Large-scale Arabic Lexical Resource to English WordNet

Gilbert Badaro,Hazem Hajj,Nizar Habash

doi:10.1145/3404854

Abstract

Success of Natural Language Processing (NLP) models, just like all advanced machine learning models, rely heavily on large -scale lexical resources. For English, English WordNet (EWN) is a leading example of a large-scale resource that has enabled advances in Natural Language Understanding (NLU) tasks such as word sense disambiguation, question answering, sentiment analysis, and emotion recognition. EWN includes sets of cognitive synonyms called synsets, which are interlinked by means of conceptual-semantic and lexical relations and where each synset expresses a distinct concept. However, other languages are still lagging behind in having large-scale and rich lexical resources similar to EWN. In this article, we focus on enabling the development of such resources for Arabic. While there have been efforts in developing an Arabic WordNet (AWN), the current version of AWN has its limitations in size and in lacking transliteration standards, which are important for compatibility with Arabic NLP tools. Previous efforts for extending AWN resulted in a lexicon, called ArSenL, that overcame the size and the transliteration standard limitation but was limited in accuracy due to the heuristic approach that only considered surface matching between the English definitions from the Standard Arabic Morphological Analyzer (SAMA) and EWN synset terms, and that resulted in inaccurate mapping of Arabic lemmas to EWN’s synsets. Furthermore, there has been limited exploration of other expansion methods due to expensive manual validation needed. To address these limitations of simultaneously having large-scale size with high accuracy and standard representations, the mapping problem is formulated as a link prediction problem between a large-scale Arabic lexicon and EWN, where a word in one lexicon is linked to a word in another lexicon if the two words are semantically related. We use a semi-supervised approach to create a training dataset by finding common terms in the large-scale Arabic resource and AWN. This set of data becomes implicitly linked to EWN and can be used for training and evaluating prediction models. We propose the use of a two-step Boosting method, where the first step aims at linking English translations of SAMA’s terms to EWN’s synsets. The second step uses surface similarity between SAMA’s glosses and EWN’s synsets. The method results in a new large-scale Arabic lexicon that we call ArSenL 2.0 as a sequel to the previously developed sentiment lexicon ArSenL. A comprehensive study covering both intrinsic and extrinsic evaluations shows the superiority of the method compared to several baseline and state-of-the-art link prediction methods. Compared to previously developed ArSenL, ArSenL 2.0 included a larger set of sentimentally charged adjectives and verbs. It also showed higher linking accuracy on the ground truth data compared to previous ArSenL. For extrinsic evaluation, ArSenL 2.0 was used for sentiment analysis and showed, here, too, higher accuracy compared to previous ArSenL.

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A Link Prediction Approach for Accurately Mapping a Large-scale Arabic Lexical Resource to English WordNet

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Asian and Low-Resource Language Information Processing

Lead the way for us

Journal: ACM Transactions on Asian and Low-Resource Language Information Processing	Publication Date: Oct 13, 2020
Citations: 7

Similar Papers

Effects of Semantic Gaps on Arabic WordNet-Based Similarity Measures
Mamoun Abu Helou
-
Mamoun Abu HelouMamoun Abu Helou
01 Nov 2019
01 Nov 2019

Building an Arabic Sentiment Lexicon Using Semi-supervised Learning
Fawaz H.H Mahyoub ... Mohamed Y Dahab
Journal of King Saud University - Computer and Information Sciences | VOL. 26
Fawaz H.H Mahyoub, et. al.Fawaz H.H Mahyoub ... Mohamed Y Dahab
28 Sep 2014
Journal of King Saud University - Computer and Information Sciences | VOL. 26

TDLP: time decay based link prediction method for dynamic networks
Xu Zhang ... Xuexia Ye
-
Xu Zhang, et. al.Xu Zhang ... Xuexia Ye
06 May 2022
06 May 2022

Development of FriendLink Similarity Metric for Link Prediction in Weighted Multiplex Networks
Xu Zhang ... Ahad Abolfathi
Cybernetics and Systems | VOL. ahead-of-print
Xu Zhang, et. al.Xu Zhang ... Ahad Abolfathi
23 Nov 2022
Cybernetics and Systems | VOL. ahead-of-print

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Link Prediction Approach for Accurately Mapping a Large-scale Arabic Lexical Resource to English WordNet

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Asian and Low-Resource Language Information Processing