Creating and Validating Multilingual Semantic Representations for Six Languages: Expert versus Non-Expert Crowds

Mahmoud El-Haj,Scott Piao,Stephen Wattam,Paul Rayson

doi:10.18653/v1/w17-1908

Abstract

Creating high-quality wide-coverage multilingual semantic lexicons to support knowledge-based approaches is a challenging time-consuming manual task. This has traditionally been performed by linguistic experts: a slow and expensive process. We present an experiment in which we adapt and evaluate crowdsourcing methods employing native speakers to generate a list of coarse-grained senses under a common multilingual semantic taxonomy for sets of words in six languages. 451 non-experts (including 427 Mechanical Turk workers) and 15 expert participants semantically annotated 250 words manually for Arabic, Chinese, English, Italian, Portuguese and Urdu lexicons. In order to avoid erroneous (spam) crowdsourced results, we used a novel taskspecific two-phase filtering process where users were asked to identify synonyms in the target language, and remove erroneous senses.

Highlights

Machine understanding of the meaning of words, phrases, sentences and documents has challenged computational linguists since the 1950s, and much progress has been made at multiple levels
We evaluate how efficient the approach is, and how robust the semantic representation is across six languages
This will add an entry in the list, that .can be sorted so that the most commonly used tag is at the top Please remove any unrelated tags and make sure you do not exceed 10 tags in total To help you with identifying common senses of a word, we have provided a number of links to dictionaries, thesauri, and corpora

Summary

Introduction

Machine understanding of the meaning of words, phrases, sentences and documents has challenged computational linguists since the 1950s, and much progress has been made at multiple levels. Common to all of these tasks, in the supervised setting, is the requirement for a wide coverage semantic lexicon acting as a knowledge base from which to select or derive potential word or phrase level sense annotations. The creation of large-scale semantic lexical resources is a time-consuming and difficult task. Regional varieties, dialects, or domains the task will need to be repeated and revised over time as word meanings evolve. We report on work in which we adapt crowdsourcing techniques to speed up the creation of new semantic lexical resources. We evaluate how efficient the approach is, and how robust the semantic representation is across six languages

Objectives

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Creating and Validating Multilingual Semantic Representations for Six Languages: Expert versus Non-Expert Crowds

Abstract

Highlights

Summary

Talk to us

Similar Papers

Lead the way for us

Publication Date: Jan 1, 2017
Citations: 24	License type: cc-by

Similar Papers

Multilingual representations for low resource speech recognition and keyword search
Jia Cui ... Bhuvana Ramabhadran
-
Jia Cui, et. al.Jia Cui ... Bhuvana Ramabhadran
01 Dec 2015
01 Dec 2015

Semantic Drift in Multilingual Representations
Lisa Beinborn ... Rochelle Choenni
Computational Linguistics | VOL. 46
Lisa Beinborn, et. al.Lisa Beinborn ... Rochelle Choenni
01 Nov 2020
Computational Linguistics | VOL. 46

Grapheme-based Automatic Speech Recognition using Probabilistic Lexical Modeling

-

01 Jan 2014
01 Jan 2014

Impact of Anxiety in English Language Learning of Second Language Learners
-
Central European Management Journal | VOL. -
--
01 Jan 2021
Central European Management Journal | VOL. -

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Creating and Validating Multilingual Semantic Representations for Six Languages: Expert versus Non-Expert Crowds

Abstract

Highlights

Summary

Talk to us

Similar Papers