Extracting semantic representations from word co-occurrence statistics: stop-lists, stemming, and SVD

John A Bullinaria,Joseph P Levy

doi:10.3758/s13428-011-0183-8

Abstract

In a previous article, we presented a systematic computational study of the extraction of semantic representations from the word-word co-occurrence statistics of large text corpora. The conclusion was that semantic vectors of pointwise mutual information values from very small co-occurrence windows, together with a cosine distance measure, consistently resulted in the best representations across a range of psychologically relevant semantic tasks. This article extends that study by investigating the use of three further factors--namely, the application of stop-lists, word stemming, and dimensionality reduction using singular value decomposition (SVD)--that have been used to provide improved performance elsewhere. It also introduces an additional semantic task and explores the advantages of using a much larger corpus. This leads to the discovery and analysis of improved SVD-based methods for generating semantic representations (that provide new state-of-the-art performance on a standard TOEFL task) and the identification and discussion of problems and misleading results that can arise without a full systematic study.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Extracting semantic representations from word co-occurrence statistics: stop-lists, stemming, and SVD

Abstract

Talk to us

Similar Papers

More From: Behavior Research Methods

Lead the way for us

Journal: Behavior Research Methods	Publication Date: Jan 19, 2012
Citations: 251

Similar Papers

EMPIRICAL ANALYSIS OF THE EFFECT OF DIMENSION REDUCTION AND WORD ORDER ON SEMANTIC VECTORS
Laurianne Sitbon ... Christian Prokopp
International Journal of Semantic Computing | VOL. 06
Laurianne Sitbon, et. al.Laurianne Sitbon ... Christian Prokopp
01 Sep 2012
International Journal of Semantic Computing | VOL. 06

Decision letter: Early language exposure affects neural mechanisms of semantic representations
Jamie Reilly ... Floris P de Lange
-
Jamie Reilly, et. al.Jamie Reilly ... Floris P de Lange
23 Jan 2023
23 Jan 2023

Author response: Early language exposure affects neural mechanisms of semantic representations
Xiaosha Wang ... Yanchao Bi
-
Xiaosha Wang, et. al.Xiaosha Wang ... Yanchao Bi
28 Mar 2023
28 Mar 2023

Editor's evaluation: Early language exposure affects neural mechanisms of semantic representations
Jonathan Erik Peelle
-
Jonathan Erik PeelleJonathan Erik Peelle
23 Jan 2023
23 Jan 2023

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Extracting semantic representations from word co-occurrence statistics: stop-lists, stemming, and SVD

Abstract

Talk to us

Similar Papers

More From: Behavior Research Methods