Word Embeddings and Semantic Spaces in Natural Language Processing

Peter J Worth

doi:10.4236/ijis.2023.131001

Abstract

One of the critical hurdles, and breakthroughs, in the field of Natural Language Processing (NLP) in the last two decades has been the development of techniques for text representation that solves the so-called curse of dimensionality, a problem which plagues NLP in general given that the feature set for learning starts as a function of the size of the language in question, upwards of hundreds of thousands of terms typically. As such, much of the research and development in NLP in the last two decades has been in finding and optimizing solutions to this problem, to feature selection in NLP effectively. This paper looks at the development of these various techniques, leveraging a variety of statistical methods which rest on linguistic theories that were advanced in the middle of the last century, namely the distributional hypothesis which suggests that words that are found in similar contexts generally have similar meanings. In this survey paper we look at the development of some of the most popular of these techniques from a mathematical as well as data structure perspective, from Latent Semantic Analysis to Vector Space Models to their more modern variants which are typically referred to as word embeddings. In this review of algoriths such as Word2Vec, GloVe, ELMo and BERT, we explore the idea of semantic spaces more generally beyond applicability to NLP.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: International Journal of Intelligence Science	Publication Date: Jan 1, 2023
Citations: 12	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Word Embeddings and Semantic Spaces in Natural Language Processing

Abstract

Talk to us

Similar Papers

More From: International Journal of Intelligence Science

Lead the way for us

Similar Papers

Anaphora and coreference resolution: A review
Rhea Sukthanker ... Ramkumar Thirunavukarasu
Information Fusion | VOL. 59
Rhea Sukthanker, et. al.Rhea Sukthanker ... Ramkumar Thirunavukarasu
01 Feb 2020
Information Fusion | VOL. 59

Text Mining: Text Representation
Rosarina Vallelunga ... Ileana Scarpino
Reference Module in Life Sciences | VOL. -
Rosarina Vallelunga, et. al.Rosarina Vallelunga ... Ileana Scarpino
01 Jan 2024
Reference Module in Life Sciences | VOL. -

Word Embeddings for Natural Language Processing

-

01 Jan 2015
01 Jan 2015

Graph-Based Natural Language Processing and Information Retrieval Rada Mihalcea and Dragomir Radev (University of North Texas and University of Michigan) Cambridge, UK: Cambridge University Press, 2011, viii+192 pp; hardbound, ISBN 978-0-521-89613-9, $65.00
Chris Biemann
Computational Linguistics | VOL. 38
Chris BiemannChris Biemann
01 Mar 2012
Computational Linguistics | VOL. 38

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Word Embeddings and Semantic Spaces in Natural Language Processing

Abstract

Talk to us

Similar Papers

More From: International Journal of Intelligence Science