Abstract
This paper presents a bottom-up approach to building a comprehensive infrastructure for the analysis of user-generated content for several South Slavic languages (Slovene, Croatian, Serbian). The goal of this collaboration was to leverage the available knowhow and language similarity in order to provide language resources and tools for the study of netspeak for all three languages in parallel and with minimal resources. We demonstrate the usefulness of the developed infrastructure for a corpus-based, comparative sociolinguistic investigation of language attitudes by Slovenian, Croatian, and Serbian Twitter users, who have witnessed a rapid codification divergence and reinforcement of national languages after the dissolution of Yugoslavia in the early 1990s.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Similar Papers
More From: Modern Languages Open
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.