Abstract

This chapter provides an overview of existing Gaelic textual corpora and archives, and reports on a new textual corpus for Scottish Gaelic, Corpas na Gaidhlig, currently being compiled at the University of Glasgow. It includes a corpus-based study of the use of singular nouns with the numerals 'three' to 'ten' in Scottish Gaelic.Keywords: Scottish Gaelic, Gaelic corpora, Corpas na Gaidhlig, numerals, singular nounsThe purpose of my contribution to this volume is threefold: firstly, to give an overview of existing Gaelic textual corpora and archives; secondly, to describe briefly the textual basis for the first phase in the creation of a new comprehensive textual corpus for Scottish Gaelic at the University of Glasgow; and thirdly, to illustrate with one case study how the use of textual corpora, even relatively small corpora, can provide valuable new insights into linguistic patterning in Gaelic. The case study to be considered concerns the use of singular nouns with the numerals 'three' to 'ten' in Scottish Gaelic, an understudied area in Scottish Gaelic grammar.Scottish Gaelic language scholars work in an entirely different environment than English scholars in terms of the resources and workforce available to them. This is particularly true when it comes to online corpora, which have been available to English scholars since the early 1960s. Anderson and Corbett's Exploring English with Online Corpora refers to no less than eighteen substantial corpora for different varieties and registers of English.1 In stark contrast, there are no major publicly available online corpora for Scottish Gaelic. Existing online text and corpus resources for Scottish and Irish Gaelic may be succinctly described as follows.The Language Engineering Resources for the Indigenous Minority Languages of the British Isles (BIML) project at Lancaster University has a small (c. 17,500-word) spoken corpus of Scottish Gaelic, which includes a conversation, a university lecture, two sermons, and an informal talk.2 The Scottish Corpus of Texts & Speech (SCOTS) project at the University of Glasgow includes a small number of Gaelic texts.3 Ciaran O Duibhin's Tobar na Gaedhilge is a searchable textbase of over 3.5 million words from twentieth-century Gaelic texts (mostly Irish but including some Scottish Gaelic).4 A selection of online digitised Gaelic texts may be found in Corpus de Theacsaichean Gaidhlig, hosted by Sabhal Mor Ostaig, Isle of Skye.5 The Gaelic Speech Recognition and the Scots Gaelic Sound Archive project at the University of Edinburgh has produced a corpus of transcribed Gaelic radio speech containing c. 70,000 words; it also includes c. 200,000 words of Gaelic news text captured from the web but this corpus is not publicly available.6 Kevin P. Scannell's Crubadan [lit. 'crawler'] project, which seeks to build corpora for under-resourced languages using web-crawling software, has created a corpus of 999,396 Scottish Gaelic words, although the full-text corpus is not available for download.7 The National Library of Scotland has embarked upon a major project as part of the Internet Archive project to digitize c. 3000 out-of-copyright Gaelic printed works. Several hundred of these are available in a variety of formats including, unfortunately, unproofed text.8 While there are no plans for these to be proofread, they are potentially of use for corpus projects, and have with permission been utilized by the DASG project referred to below. William Lamb created a private corpus of 82,677 words for his 2002 PhD thesis, which included a spoken subcorpus of conversation, radio interview, sports broadcast, traditional narrative, and a written subcorpus of academic prose, fiction, popular writing, and radio news scripts; this, the only tagged corpus in existence for Scottish Gaelic, is not publicly available.9 Elsewhere Lamb calls for a large corpus of conversational Scottish Gaelic to be created along the lines of the British National Corpus for English. …

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call