Abstract

AbstractIt is well‐known that Outer Circle English has undergone extensive contact‐induced lexical and grammatical restructuring. Is it possible to use common NLP tools developed for Inner Circle English to process Outer Circle English texts? Here, we report our experience of using the Stanford PoS tagger to tag the Singaporean component of the International Corpus of English (ICE‐SIN). We isolate two major contact‐related causes of tagging errors: (1) lexical and grammatical loans directly borrowed from the local languages; and (2) English‐origin words with new grammatical meanings acquired from the local languages. While the first type may be easy to overcome, the latter type is intractable, creating an extra layer of morphosyntactic complexity. We achieved comparable accuracy rates in the more formal registers, and a lower but still decent 88% in the informal register of private conversations. A tagged ICE‐SIN allows us to investigate lexical and grammatical restructuring at unprecedented levels of detail.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call