This article addresses the principles of selecting paronyms and the methodology for compiling an electronic German-Ukrainian dictionary of paronyms using a corpus-based approach and systematic analysis to identify similarities and differences in the use of paronyms in speech. The author presents a concept of paronymy as a systemic phenomenon involving etymologically related words with similar morphemic structures but different meanings. By conducting a comparative analysis of collocations, the study identified 1,836 paronyms organized into paronymic series, each comprising two to nine components. The article examines existing dictionaries of paronyms and highlights their lexicographic shortcomings, such as excessive normative/prescriptive orientation, inadequate development of dictionary entries from both linguistic and didactic perspectives, and a lack of rigorous evaluations of modern data to inform dictionary compilation. The ideal paronym dictionary should rely on corpus data, providing reliable information with generalizations about individual word uses within conventionalized language usage. Corpus-based tools, such as those used for contrasting near-synonyms, systematically identify similarities and differences between paronyms in a contrastive manner. This process determines the degree of semantic closeness among expressions with similar contextual usage by comparing their immediate collocational patterns. The recorded semantic closeness or distance between paronyms is based on their contextual overlap in usage. The lexicographic processing of the selected paronyms was carried out with TshwaneLex, a computer-based dictionary creation software that allows lexicographers to develop dictionaries for any language without requiring advanced IT skills. TshwaneLex offers various features such as automatic cross-reference tracking, advanced dictionary comparison/merge functionality, and support for all world languages through full Unicode compatibility. The lemma editing process uses a tree-based interface to represent the hierarchical structure of the lemma, including different meanings, submeanings, word formations, multi-word units, examples of usage, cross-references, and more. Furthermore, the input/output architecture is designed to support the development of additional interfaces for various data sources as add-ons or plug-ins, facilitating other output formats. The article notes that a PHP-based software module is currently available for hosting the TshwaneLex dictionary on the internet, providing an accessible platform for electronic dictionary publication.
Read full abstract