Abstract

This volume contains 15 selected papers originally presented at the 5th North American Symposium on Corpus Linguistics at Montclair State University, New Jersey, in 2004. Focusing on the use of corpora to study domains, ‘beyond the word’, the symposium covered a wide range of topics, from corpus creation, discourse and register variation to applications in language or medical education, most of them involving tools, approaches, and statistical techniques new to corpus linguistics. The editor uses the phrase ‘beyond the word’ to indicate linguistic productions longer than the word, from phrases to pragmatics. The papers are arranged in two sections: one on syntactic analysis tools and corpus annotation, and a second on applications in pedagogy and linguistic analysis. In the first chapter, Barrett, Greenberg, and Schwartz report on exploratory research in automatically selecting documents from distinct domains for machine translation corpora in order to improve results in machine translation. Their method relies on the assumption that texts belonging to different domains have a different syntactic profile. A comparison of part-of-speech tag densities in seven hand-selected documents in four different domains (medical, financial, military, and narrative) suggests that there is a direct correlation between texts from the same domain. This method could also be applied to genre, register or authorship analysis, and in the second chapter, Grieve-Smith investigates whether it is actually possible to exclude grammatical sources of covariation from a list of markers of register, genre or style variation. Using Douglas Biber's (e.g. Biber, 1988) notion of the ‘envelope of variation’, where grammatical features are counted ‘as a proportion of the opportunities for these features to be produced’ (p. 21), Grieve-Smith analyses two features that correlate but do not co–vary according to Biber in the Michigan Corpus of Academic Spoken English (MICASE) corpus.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call