Abstract

AbstractThis article introduces the first version of the Corpus of Singapore English Messages (CoSEM), a 3.6‐million‐word monitor corpus of online text messages collected between 2016 and 2019, compiled and managed by a group of scholars who share an interest in Colloquial Singapore English (CSE) research. The paper explains the motivations behind developing a new corpus for the investigation of CSE. It also documents the process of compiling and organizing CoSEM and describes the corpus's initial structure and composition. We further discuss the social variables used in tagging the data, as well as ethical challenges, advantages, and disadvantages unique to online message datasets. In addition, we present preliminary analyses of two selected CSE features: (1) the Hokkien‐derived expression (bo)jio and (2) sentence‐final adverbs (already, also, only). As CoSEM is an ongoing project, we conclude the article with notes on future directions.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call