Slovene and Croatian word embeddings in terms of gender occupational analogies

Matej Ulčar,Marko Robnik-Šikonja,Anka Supej,Senja Pollak

doi:10.4312/slo2.0.2021.1.26-59

Slovene and Croatian word embeddings in terms of gender occupational analogies

Matej Ulčar, Marko Robnik-Šikonja + Show 2 more

Open Access

https://doi.org/10.4312/slo2.0.2021.1.26-59

Copy DOI

Journal: Slovenščina 2.0: empirical, applied and interdisciplinary research	Publication Date: Jul 6, 2021
Citations: 1	License type: CC BY-SA 4.0

Affiliation: University of Ljubljana, Jožef Stefan Institute

#fastText Embeddings #Embeddings In Terms + Show 8 more

Abstract
Full-Text PDF
Similar Papers

Abstract

In recent years, the use of deep neural networks and dense vector embeddings for text representation have led to excellent results in the field of computational understanding of natural language. It has also been shown that word embeddings often capture gender, racial and other types of bias. The article focuses on evaluating Slovene and Croatian word embeddings in terms of gender bias using word analogy calculations. We compiled a list of masculine and feminine nouns for occupations in Slovene and evaluated the gender bias of fastText, word2vec and ELMo embeddings with different configurations and different approaches to analogy calculations. The lowest occupational gender bias was observed with the fastText embeddings. Similarly, we compared different fastText embeddings on Croatian occupational analogies.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Similar Papers

Paper Title

Journal

Date

Author

View more papers

More From: Slovenščina 2.0: empirical, applied and interdisciplinary research

Paper Title

Journal

Date

Author

View more papers

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.