Abstract

This paper presents a bottom-up approach to building a comprehensive infrastructure for the analysis of user-generated content for several South Slavic languages (Slovene, Croatian, Serbian). The goal of this collaboration was to leverage the available knowhow and language similarity in order to provide language resources and tools for the study of netspeak for all three languages in parallel and with minimal resources. We demonstrate the usefulness of the developed infrastructure for a corpus-based, comparative sociolinguistic investigation of language attitudes by Slovenian, Croatian, and Serbian Twitter users, who have witnessed a rapid codification divergence and reinforcement of national languages after the dissolution of Yugoslavia in the early 1990s.

Highlights

  • The increasing popularity of Web 2.0 has resulted in an unprecedented surge of user-generated and social media content which is rapidly becoming a major source of knowledge and opinion, and is considered a catalyst of bottom-up communication practices that contribute towards the democratization of language

  • In this paper we showcase our approach to building a comprehensive infrastructure for the analysis of user-generated content (UGC) for several South Slavic languages (Slovene, Croatian, Serbian) in parallel, as initiated by the JANES1 (Fišer, Ljubešić, and Erjavec) and ReLDI2 (Samardžić, Ljubešić, and Miličević) projects

  • The tools are described in detail in Fišer et al, so we focus on presentation of the UGC-specific tools that were developed for all three languages in the context of collaboration between the JANES and ReLDI projects

Read more

Summary

Introduction

The increasing popularity of Web 2.0 has resulted in an unprecedented surge of user-generated and social media content which is rapidly becoming a major source of knowledge and opinion, and is considered a catalyst of bottom-up communication practices that contribute towards the democratization of language. The work was primarily motivated by the close relations among the languages in question, and by the uneven socioeconomic circumstances over the past three decades in the countries where they are spoken and the unequal development of research infrastructure for computational and corpus linguistics, which is most mature in Slovenia and least so in Serbia. The goal of this bottom-up collaboration was to leverage the available knowhow and language similarity in order to provide new language resources and tools for the study of netspeak for the three languages in parallel, with minimal investment of researchers’ time, effort, and finances

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.