Abstract

Recent advancements in natural language processing (NLP) improved many systems that were relying on natural language to achieve better communication with user using this system. One of the main problem in application of NLP in multiple languages is lack of tools that can be used to develop such systems. Croatian language is highly inflected language from Slavic language family and traditional models used that give great results for English language behave poorly for morphology rich languages. In this article we present model for creating word embeddings for morphologically rich languages such as Croatian. We evaluate the generated word embeddings on newly created word similarity corpus, that is based on English similarity corpus. In the evaluation of word embeddings we compare with two of the best word representation models for English language. We also evaluate our approach with multi-language models such as FastText. The word embeddings created in this article will be used for developing component in training neural models for semantic understanding of sentences written in Croatian language. These language tools can be utilized in many systems where natural language understanding (NLU) and natural language generation (NLG) is needed. In the introduction we give global insight about word embeddings, what are the models for creating such representations and where these representations could be used. In the second section we mention some of the best models for creating word embeddings. In the third section we give a frame-work for development and evaluation of word embeddings for Croatian language. In the conclusion we emphasis the importance of developing tools in Croatian language and announcement of future research.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.