Abstract
This paper presents a bottom-up approach to building a comprehensive infrastructure for the analysis of user-generated content for several South Slavic languages (Slovene, Croatian, Serbian). The goal of this collaboration was to leverage the available knowhow and language similarity in order to provide language resources and tools for the study of netspeak for all three languages in parallel and with minimal resources. We demonstrate the usefulness of the developed infrastructure for a corpus-based, comparative sociolinguistic investigation of language attitudes by Slovenian, Croatian, and Serbian Twitter users, who have witnessed a rapid codification divergence and reinforcement of national languages after the dissolution of Yugoslavia in the early 1990s.
Highlights
The increasing popularity of Web 2.0 has resulted in an unprecedented surge of user-generated and social media content which is rapidly becoming a major source of knowledge and opinion, and is considered a catalyst of bottom-up communication practices that contribute towards the democratization of language
In this paper we showcase our approach to building a comprehensive infrastructure for the analysis of user-generated content (UGC) for several South Slavic languages (Slovene, Croatian, Serbian) in parallel, as initiated by the JANES1 (Fišer, Ljubešić, and Erjavec) and ReLDI2 (Samardžić, Ljubešić, and Miličević) projects
The tools are described in detail in Fišer et al, so we focus on presentation of the UGC-specific tools that were developed for all three languages in the context of collaboration between the JANES and ReLDI projects
Summary
The increasing popularity of Web 2.0 has resulted in an unprecedented surge of user-generated and social media content which is rapidly becoming a major source of knowledge and opinion, and is considered a catalyst of bottom-up communication practices that contribute towards the democratization of language. The work was primarily motivated by the close relations among the languages in question, and by the uneven socioeconomic circumstances over the past three decades in the countries where they are spoken and the unequal development of research infrastructure for computational and corpus linguistics, which is most mature in Slovenia and least so in Serbia. The goal of this bottom-up collaboration was to leverage the available knowhow and language similarity in order to provide new language resources and tools for the study of netspeak for the three languages in parallel, with minimal investment of researchers’ time, effort, and finances
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.