Abstract
Corpus findings are only useful if the corpus adequately represents the content and language of the target domain; yet few studies evaluate or report representativeness. This paper argues that corpus linguists should focus explicitly on the validation process. It introduces the innovative concept of a Representativeness Argument, which is an explicit statement of reliability and validity to enable defensible applications of a corpus for a specifically defined purpose and audience. Adapted from Toulmin's (1958/2003) argument model, its originality lies in its attention to both target domain and linguistic representativeness, and in the critical role played by expert judgements. To illustrate this approach, I present a representativeness argument for the 1.98-million-word ‘DSVC-IL’ corpus, which was compiled to investigate the discipline-specific vocabulary required for reading postgraduate International Law texts. The corpus is demonstrated to adequately represent target domain content, established by analysing modules and reading lists, and confirmed by experts. The language is shown to adequately reflect the domain through analysis of a 1026-flemma Single Word List, extracted using measures of frequency, keyness, range and evenness of distribution. List items are evenly-distributed in randomly-split corpus halves (rs=.98, p<.00). The list provides similar coverage of the DSVC-IL (26.37%) and other texts from the domain (23.87%). Moreover, Law experts confirmed the majority of list items were Law words. Together, the evidence supports the usefulness of the corpus and list for its explicitly defined purpose.
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have