Abstract

Challenges to issues of balance and representativeness in African lexicography

Highlights

  • McEnery and Wilson (1996: 24) mention a reliance on computers in their definition of a corpus as "a finite-sized body of machine-readable text, sampled in order to be maximally representative of the language variety under consideration"

  • An attempt has been made to show that, while corpus research remains one of the most useful approaches to language research in that it can speedily offer information for addressing language-related issues and problems, a critical look at the process of corpus construction and inclusion would help determine if generalisations drawn from its results can be trusted as a true reflection of language use

  • The African context is unique in that, unlike Western communities, many African countries do not use their languages for academic purposes, in the media, and for governmental and official communication, making machine-readable data (MRD) difficult to access

Read more

Summary

Introduction

More and more lexicographers realise the inevitability of using a corpus or corpora in the compilation of dictionaries. Leech (1991: 8) defines a corpus as "a sufficiently large body of naturally occurring data of the language to be investigated". Renouf (1987: 1) refers to the use of computers in the storing and analysis of corpora in his definition: "a collection of texts, of written or spoken words, which is stored and processed on computer for the purpose of linguistic research". McEnery and Wilson (1996: 24) mention a reliance on computers in their definition of a corpus as "a finite-sized body of machine-readable text, sampled in order to be maximally representative of the language variety under consideration". Leech (1991: 5), insists that a corpus has to be differentiated from an "archive", the latter being a repository of available language materials, and the former being a systematic collection of material for given purposes. It is generally agreed that in a speech community the spoken word exists in abundance compared to written texts Taking these linguistic arguments as base and applying them by implication to issues of balance and representativeness, it can be concluded that if corpus construction has to reflect the different ratios between spoken and written texts, different text genres and various dialectal varieties, the percentage of spoken language has to be much greater than that of written language in a corpus. Recognise the inadequacy of speech in the BNC which contains about 90 per cent written data and 10 per cent spoken data: spoken language, as the primary channel of communication, should by rights be given more prominence than this, in practice this has not been possible, since it is a skilled and very time-consuming task to transcribe speech into the computer-readable orthographic text that can be processed to extract linguistic information In view of this problem, these proportions were chosen as realistic targets which, given the size of the BNC, are sufficiently large to be broadly representative. If corpora do not reflect in their composition that the spoken word is more common in real life than the written text, it calls the power and authority of corpora as sources of evidence for linguistic research in question and opens them to possible doubt

A Newspaper versus the Purchase of a Pair of Shoes
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call