Abstract

This paper reports on the construction of the Sydney Corpus of Television Dialogue (SydTV). SydTV comprises approximately 275,000-words of dialogue from sixty-six episodes of recent US American fictional television series. The paper first provides a brief overview of existing TV dialogue corpora and then outlines the basic corpus composition, the corpus design principles, and the processes of data collection and storage. SydTV is a small, specialised corpus designed with the objective of being representative of fictional US TV dialogue. TV dialogue is defined as the dialogue uttered by actors on screen as they are performing characters in fictional TV series. The corpus is fairly balanced, since it contains 116,295 words from drama genres and 158,779 words from comedy genres as well as 135,887 words from ‘quality’ and 139,187 words from ‘mainstream’ TV series, in addition to a healthy mix of different types of episodes in terms of textual time (pilot episodes, final episodes, episodes occurring at the beginning, middle or end of a season). The corpus is available for educational (teaching and research) purposes through an online interface 2 and has a companion website 3 where frequency lists are provided.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call