Shabd: A psycholinguistic database for Hindi.

Ark Verma,Vivek Sikarwar,Ranjith Jaganathan,Himanshu Yadav,Pawan Kumar

doi:10.3758/s13428-021-01625-2

Abstract

We present Shabd, a psycholinguistic database in Hindi. It is based on a corpus of 1.4 billion words from electronic newspapers and news websites. Word frequencies and part of speech information have been derived and are made available in a cleaned list of 34 thousand hand-selected words, and a list of 96 thousand words observed with a frequency of more than 100 times in the corpus. Next to the Shabd database, we also make a list with all 2.3 million word types available and a list with the 2.5 million most frequent word pairs (word bigrams). The quality of the word frequency measure was tested in two lexical decision tasks. We observed that the Shabd word frequencies outperform existing frequencies based on smaller corpora of newspapers but not the Worldlex word frequencies based on an analysis of blogs. We also observed that word frequency accounts for as much variance as contextual diversity (operationalized as the number of documents in which the words were observed). The Shabd database is freely available for research.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Shabd: A psycholinguistic database for Hindi.

Abstract

Talk to us

Similar Papers

More From: Behavior research methods

Lead the way for us

Journal: Behavior research methods	Publication Date: Aug 6, 2021
Citations: 5

Similar Papers

Event-related brain potentials in lexical processing with Chinese characters show effects of contextual diversity but not word frequency.
Jingjing Zhang ... Michael K Tanenhaus
Psychonomic bulletin & review | VOL. -
Jingjing Zhang, et. al.Jingjing Zhang ... Michael K Tanenhaus
18 Jun 2024
Psychonomic bulletin & review | VOL. -

Disentangling the effects of word frequency and contextual diversity on serial recall performance.
Fabrice B R Parmentier ... Ana Paula Soares
Quarterly Journal of Experimental Psychology | VOL. 70
Fabrice B R Parmentier, et. al.Fabrice B R Parmentier ... Ana Paula Soares
01 Jan 2017
Quarterly Journal of Experimental Psychology | VOL. 70

Delineating linguistic contexts, and the validity of context diversity as a measure of a word's contextual variability
Geoff Hollis
Journal of Memory and Language | VOL. 114
Geoff HollisGeoff Hollis
07 Jul 2020
Journal of Memory and Language | VOL. 114

Subtlex-pl: subtitle-based word frequency estimates for Polish.
Paweł Mandera ... Marc Brysbaert
Behavior Research Methods | VOL. 47
Paweł Mandera, et. al.Paweł Mandera ... Marc Brysbaert
19 Jun 2014
Behavior Research Methods | VOL. 47

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Shabd: A psycholinguistic database for Hindi.

Abstract

Talk to us

Similar Papers

More From: Behavior research methods