Abstract

The Process Corpus of English in Education (PROCEED) is a learner corpus of English which, in addition to written texts, consists of data that make the writing process visible in the form of keystroke log files and screencast videos. It comes with rich metadata about each learner, among which indices of exposure to the target language and cognitive measures such as working memory or fluid intelligence. It also includes an L1 component which is made up of similar data produced by the learners in their mother tongue. PROCEED opens new perspectives in the study of learner writing, by going beyond the written product. It makes it possible to investigate aspects such as writing fluency, use of online resources, cognitive phenomena like automaticity and avoidance, or theoretical modelling of the writing process. It also has applications for teaching, e.g. by showing students screencast video clips from the corpus illustrating effective writing strategies, as well as for testing, e.g. by establishing a corpus-derived standard of writing fluency for learners at a certain proficiency level.

Highlights

  • FROM WRITTEN PRODUCT TO WRITING PROCESSThe first electronic corpus ever, the Brown Corpus, was a corpus of written English

  • I.e. corpora consisting of language produced by foreign or second language (L2) learners, 64 per cent are made up of written texts only according to the current version of the Learner Corpora around the World list maintained by the Centre for English Corpus Linguistics (2020)

  • A notable exception is Wengelin (2006), who describes her data sets, consisting of keystroke log files for Swedish texts, as corpora. She shows how the techniques of corpus linguistics can be applied to the study of pauses in writing by looking for ‘microcontexts’ made up of a pause preceded and followed by certain elements

Read more

Summary

INTRODUCTION

The first electronic corpus ever, the Brown Corpus, was a corpus of written English. Since many corpora have been collected that represent written language. A notable exception is Wengelin (2006), who describes her data sets, consisting of keystroke log files for Swedish texts, as corpora She shows how the techniques of corpus linguistics can be applied to the study of pauses in writing by looking for ‘microcontexts’ made up of a pause preceded and followed by certain elements (e.g. a pause preceded by a typed letter and followed by a deletion). Hamel and Séror (2016: 156) use the term corpus to describe a collection of screencast videos showing the writing process of L2 learners of French and English They point out that such corpora represent new and exciting forms of empirical data which, once anonymized, could contribute to learner corpus projects that might be shared with others. The L1 data are collected according to the same principles as the L2 data: the learners have about 45 minutes to write a 350-word argumentative text on one of several set topics/quotes, while their screen and keyboard activity is recorded with their permission

The metadata
Writing process research
Teaching and testing applications
Findings
CONCLUSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.