Abstract
The paper describes an ongoing project that aims at building a reference corpus of German computer-mediated communication (CMC) as a new component of an already existing reference corpus of written contemporary German. The ‘Deutsches Referenzkorpus zur internetbasierten Kommunikation’ (DeRiK) shall include data from the most prominent CMC genres amongst German Internet users and, thus, close a gap in the coverage of the corpus resources in the project “Digitales Worterbuch der deutschen Sprache” (DWDS) which are maintained and provided by the Berlin-Brandenburg Academy of Sciences and the Humanities (BBAW). The focus of the paper is on the role of the DeRiK component within the DWDS framework, on sampling issues, and on CMC-specific issues of corpus annotation. 1. Project Background and Focus of the Paper In view of the increasing amount of reading and writing that people do on the Internet, up-to-date corpora of written contemporary language must take into consideration the impact of computer-mediated communication (CMC) on contemporary language and, thus, include samples of emerging written genres such as e-mail, weblogs, microblogging on Twitter, discussion boards and wiki discussions, chats and instant messaging conversations, and communication in social network sites. In this paper we present selected aspects of an ongoing project that aims at building a reference corpus of German CMC, called DeRiK (‘Deutsches Refe
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.