Le corpus comme portail pour l’étude de la variation (socio)linguistique*

Shana Poplack

doi:10.4000/corpus.5422

Abstract

This article details the data management principles and practices developed by the University of Ottawa Sociolinguistics Lab (http://www.sociolinguistics.uottawa.ca/thelab.html), home to 19 major corpora representing hundreds of hours and millions of words of recorded everyday speech. Couched within the variationist framework for linguistic analysis, it provides a practical overview of tried-and-true methods for corpus construction, including data collection, transcription, annotation, and citation, as well as data retrieval, coding, and analysis. It also features observations on data preservation and data lifecycle, and discusses ethical considerations involved in collecting and analyzing vernacular speech. It concludes with a summary of the wide variety of linguistic applications to which properly managed spontaneous speech data can be put.

Full Text