Abstract

To some extent, we seem to use language in chunks – multiple words that are co-selected and used as gestalt units. By some estimates, these chunks constitute more than 50 percent of a given text ( Erman and Warren, 2000 ). The extent to which our communication is composed of these units has broad implications for linguistic theory, psycholinguistics and applied linguistics, and so is the focus of this study. This study shows that claims made regarding the nature of formulaic language ( Sinclair, 1991 ) lead to a method for the automatic detection of holistically used multi-word patterns in text corpora, which in turn allows for the estimation of the ‘chunkiness’ of linguistic corpora. These estimates may be useful for materials development in language teaching, as well as corpus linguistic and psycholinguistic studies.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call