Abstract

Producing written texts is a non-linear process: in contrast to speech, writers are free to change already written text at any place at any point in time. Linguistic considerations are likely to play an important role, but so far, no linguistic models of the writing process exist. We present an approach for the analysis of writing processes with a focus on linguistic structures based on the novel concepts of transforming sequences, text history, and sentence history. The processing of raw keystroke logging data and the application of natural language processing tools allows for the extraction and filtering of product and process data to be stored in a hierarchical data structure. This structure is used to re-create and visualize the genesis and history for a text and its individual sentences. Focusing on sentences as primary building blocks of written language and full texts, we aim to complement established writing process analyses and, ultimately, to interpret writing timecourse data with respect to linguistic structures. To enable researchers to explore this view, we provide a fully functional implementation of our approach as an open-source software tool and visualizations of the results. We report on a small scale exploratory study in German where we used our tool. The results indicate both the feasibility of the approach and that writers actually revise on a linguistic level. The latter confirms the need for modeling written text production from the perspective of linguistic structures beyond the word level.

Highlights

  • Producing written text is a non-linear process: during production, writers are free to modify the text at any place and at any point in time, without leaving any traces in the final product—in contrast to speech

  • Even though linguistic considerations play an important role in the text production process—Cookson mentions that “Writing is a linguistically based process” (Cookson, 1989, p. 20)—and metalinguistic awareness is one competence of professional writers (Horning, 2006, p. 119), there are currently no suitable approaches that would allow us to process large amounts of production data and test the explanatory power of linguistic theories

  • We provide a first implementation in the form of an opensource stand-alone analysis tool called Text History Extraction tool (THEtool)1 intended as a proof of concept and as a starting point for further development and exploration

Read more

Summary

Introduction

Producing written text is a non-linear process: during production, writers are free to modify the text at any place and at any point in time, without leaving any traces in the final product—in contrast to speech. Recording these processes with keystroke-logging tools (for an overview of current technology and its evolution, see Lindgren et al, 2019a) allows researchers how writers produce texts. Writing process data is collected and stored in chronological order as the writer adds and removes characters This data allows to follow, replay, and investigate the non-linear evolvement of the text. Whether the models and implementations based on those theories can be used for analyzing and modeling writing processes has not been systematically studied, yet

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call