Abstract Proverbs are an important component of cultural literacy and thus they are often encountered in everyday life. Corpus-based studies of proverbs typically focus on proverb frequency. Here we address challenges in using general language corpora and corpus searches as a method for estimating proverb frequency. Using two general language corpora from Sketch Engine, HrWaC2.2 and NoTenTen17, to search for a semantic counterpart of the same proverb in Croatian and Norwegian, we explore various search options and their (dis)advantages, as well as the issue of proverb modifications that hinders attempts to obtain a reliable picture of proverb frequency in a corpus. Based on the corpus evidence, we provide insights into modification types of proverbs. Finally, we propose that an optimal search tool would consider a proverb’s features beyond frequency, such as its semantic class, syntactic complexity, and abstractness as contributors to its cognitive load, allowing a better selection of these expressions in clinical testing, education, research and entertainment.
Read full abstract