Abstract

It is generally acknowledged that discourse markers are used differently in speech and writing, yet many general descriptions and most annotation frameworks are written-based, thus partially unfit to be applied in spoken corpora. This paper identifies the major distinctive features of discourse markers in spoken language, which can be associated with problems related to their scope and structure, their meaning and their tendency to co-occur. The description is based on authentic examples and is followed by methodological recommendations on how to deal with these phenomena in more exhaustive, speech-friendly annotation models.

Highlights

  • 1 Introduction Within the large literature on discourse markers ( DMs), a number of studies investigate how uses differ according to the mode of communication

  • This is the case of the Rhetorical Structure Theory or RST (Mann and Thompson, 1988), the Penn Discourse Treebank or PDTB 2.0 (Prasad et al, 2008) and the Cognitive approach to Cognitive Relations or CCR (Sanders et al, 1992) – there are some recent endeavors to apply these frameworks to speech (e.g., Tonelli et al, 2010 for PDTB in spoken Italian)

  • This paper aims at identifying the major characteristics of discourse markers that relatively differ between spoken and written language, and discusses the problems that they pose for corpus annotation

Read more

Summary

Introduction

Within the large literature on discourse markers ( DMs), a number of studies investigate how uses differ according to the mode of communication (see for instance Chafe, 1982; Horowitz and Samuels, 1987; Castellà, 2004; Biber, 2006; López Serena and Borreguero Zuluoga, 2010). Informal conversations include a similar number of subordinators in absolute terms and in relation with the number of words, fewer if we consider the total number of clauses This puzzling fact (the paradox of complexity, as Castellà names it) has to do with the tendency to an increase of subordinators because of two different preferred strategies: a verbal style, typical of speech, implies connection between verbal units; an integrated style, typical of writing, relies on sentential connection. The analysis of spoken texts shows that the use of DMs tends to exhibit some relative differences or tendencies This is especially the case in spontaneous speech, where planning is low, and in dialogue, where interactivity is high. This data will be used to quantify the characteristics under discussion in this paper, comparing, when available, results from available written corpora (mostly the PDTB corpus of Wall Street Journal, Prasad et al, 2008)

DMs in Unplanned Speech
One Marker with Simultaneous Functions in the Same Context
Co-occurrence of DMs
Findings
This research is in line with the IS1312 COST Action “TextLink
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.