The structuralist tradition meets empirical data: Corpus data enhancing the Czech Internet Language Reference Book

Dominika Kováříková,Oleg Kovářík,Kamila Smejkalová,Martin Beneš

doi:10.3366/word.2023.0230

Abstract

This paper demonstrates how the corpus grammar tool GramatiKat can be used to improve and refine morphological information in the Internet Language Reference Book (ILRB), which presents complete declension paradigms for 45,632 standard Czech nouns. The paradigm tables are based mainly on morphological types, following structuralist conceptions of language as a fully articulated system. The paper discusses how to update the ILRB and provide users with empirically based grammatical information for individual word forms in each cell of the paradigm. All noun lemmas have been investigated using the GramatiKat tool for research into grammatical categories in Czech. The tool observes the distribution of word forms of a particular lexeme in comparison with the standard distribution across the whole word class. It is capable of identifying nouns that have an unusually high occurrence of a certain word form, as well as nouns with unattested word forms. GramatiKat is based on the data from two corpora of Czech written texts, SYN2015 and SYN2020 (200 million word tokens). The paper investigates the relationship between defectiveness and overabundance on one side and language variability and potentiality on the other. Based on the unique combination of data from the ILRB and GramatiKat, the paper suggests how information about unusually frequent or overabundant word forms as well as unattested ones should be pointed out, so that ILRB provides the user with accurate, empirically based data.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

The structuralist tradition meets empirical data: Corpus data enhancing the Czech Internet Language Reference Book

Abstract

Talk to us

Similar Papers

More From: Word Structure

Lead the way for us

Similar Papers

Frequency in Incidental Vocabulary Acquisition Research: An Undefined Concept and Some Consequences
Barry Lee Reynolds ... David Wible
TESOL Quarterly | VOL. 48
Barry Lee Reynolds, et. al.Barry Lee Reynolds ... David Wible
28 Oct 2014
TESOL Quarterly | VOL. 48

Teaching & Learning Guide for: Word Classes
Jan Rijkhoff
Language and Linguistics Compass | VOL. 3
Jan RijkhoffJan Rijkhoff
01 May 2009
Language and Linguistics Compass | VOL. 3

A Critique of Contemporary Classification of English Words into Lexical (Grammatical) Categories
Ishaya Yusuf Tsojon (Mr.) ... Mrs Blessing Ijem Ginikanwa
IOSR Journal of Humanities and Social Science | VOL. 19
Ishaya Yusuf Tsojon (Mr.), et. al.Ishaya Yusuf Tsojon (Mr.) ... Mrs Blessing Ijem Ginikanwa
01 Jan 2014
IOSR Journal of Humanities and Social Science | VOL. 19

Google stemming mechanisms
Ahmet Uyar
Journal of Information Science | VOL. 35
Ahmet UyarAhmet Uyar
28 May 2009
Journal of Information Science | VOL. 35

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

The structuralist tradition meets empirical data: Corpus data enhancing the Czech Internet Language Reference Book

Abstract

Talk to us

Similar Papers

More From: Word Structure