Abstract

The classification of the corpus is not confined to the genre and nature of texts. It spreads far beyond this. In this chapter, we have tried to show that a corpus can also be classified based on the type of text and the purpose of the corpus design. Based on the type of text, a corpus can be termed a ‘monolingual corpus’, which contains text samples from a single language or a dialect variety; a ‘bilingual corpus’, which carries proportional amounts of texts taken from two languages or dialect varieties (which may or may not be genealogically, typologically or geographically related); or a ‘multilingual corpus’, which stores a good amount of language data with proportional distribution across text types from more than two languages. On the other hand, based on the purpose of design, a corpus can be termed an ‘unannotated corpus’ where text samples are kept in their raw form without the addition of metadata or annotation of any kind; or an ‘annotated corpus’ where texts are annotated or tagged with various intralingual and extralingual data and information. Furthermore, we have also described the ‘maxims of corpus annotation’ proposed by earlier scholars; analyzed the issues involved in the act of corpus annotation; referred to the challenges directly and indirectly linked with corpus annotation; and finally, have referred to the state-of-the-art of corpus annotation in English and other languages across the world.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call