Abstract
I n this paper, Ukrainian word embeddings and their properties are examined. Provided are a theoretical description, a brief account of the most common technologies used to produce an embedding, and lists of implemented algorithms. Word2wec, the first technology for calculating word embeddings, is used to demonstrate modern approaches of calculating using neural networks. Word2wec and FastText, which evolved from word2vec, are compared, and FastText’s benefits are described. Word embeddings have been applied to solving majority of the practical tasks of natural language processing. One of the latest such applications have been in the automatic construction of translation dictionaries. A previous analysis indicates that most of the words found in English-Ukrainian dictionaries are absent in the Great Electronic Dictionary of the Ukrainian Language (VESUM) project. For embeddings in Ukrainian based on word2vec, Glove, lex2vec, and FastText, the Gensim open-source library was used to demonstrate the potential of calculated models, and the results of repeating known calculation experiments are provided. They indicate that the hypothesis about the existence of biases and stereotypes in such models does not pertain to the Ukrainian language. The quality of the word embeddings is assessed on the basis of testing analogies, and adapting lexical data from a Ukrainian associative dictionary in order to construct a selection of data for assessing the quality of word embeddings is proposed. Listed are necessary tasks of future research in the field of creating and utilizing Ukrainian word embeddings.
Published Version (
Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have