Abstract

Semantic representation lies at the core of several applications in Natural Language Processing. However, most existing semantic representation techniques cannot be used effectively for the representation of individual word senses. We put forward a novel multilingual concept representation, called MUFFIN, which not only enables accurate representation of word senses in different languages, but also provides multiple advantages over existing approaches. MUFFIN represents a given concept in a unified semantic space irrespective of the language of interest, enabling cross-lingual comparison of different concepts. We evaluate our approach in two different evaluation benchmarks, semantic similarity and Word Sense Disambiguation, reporting state-of-the-art performance on several standard datasets.

Highlights

  • Semantic representation, i.e., the task of representing a linguistic item in a mathematical or machine-interpretable form, is a fundamental problem in Natural Language Processing (NLP)

  • These approaches, whether in their conventional co-occurrence based form (Salton et al, 1975; Turney and Pantel, 2010; Landauer and Dooley, 2002), or in their newer predictive branch (Collobert and Weston, 2008; Mikolov et al, 2013; Baroni et al, 2014), suffer from a major drawback: they are unable to model individual word senses or concepts, as they conflate different meanings of a word into a single vectorial representation. This hinders the functionality of this group of vector space models in tasks such as Word Sense Disambiguation (WSD) that require the representation of individual word senses

  • We evaluate our semantic representation on two different tasks in lexical semantics: semantic similarity and Word Sense Disambiguation

Read more

Summary

Introduction

I.e., the task of representing a linguistic item (such as a word or a word sense) in a mathematical or machine-interpretable form, is a fundamental problem in Natural Language Processing (NLP). The prevailing methods for the computation of a vector space representation are based on distributional semantics (Harris, 1954) These approaches, whether in their conventional co-occurrence based form (Salton et al, 1975; Turney and Pantel, 2010; Landauer and Dooley, 2002), or in their newer predictive branch (Collobert and Weston, 2008; Mikolov et al, 2013; Baroni et al, 2014), suffer from a major drawback: they are unable to model individual word senses or concepts, as they conflate different meanings of a word into a single vectorial representation. The applicability of all these techniques, is usually either constrained to a single language (usually English), or to a specific task

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.