HKCAC: The Hong Kong Cantonese Adult Language Corpus

Man-Tak Leung,Sam-Po Law

doi:10.1075/ijcl.6.2.06leu

Abstract

An adult language corpus of spoken Hong Kong Cantonese (HKCAC) has recently been developed consisting of spontaneous speech recorded from phone-in programs and forums on the radio in Hong Kong. The database represents the speech of a total of sixty-nine speakers in addition to the program hosts, and has approximately 170,000 characters. It is believed that HKCAC will be of great value to linguists who are interested in studying Cantonese, and speech therapists and educators who work with the Cantonese speaking population. A search engine with a user-friendly interface has also been developed by using FileMaker Pro 4.0 (Chinese version). Apart from the basic frequency information and the display of search results in KWAL (Key Word And Line) format, the search engine also allows users to search for various phonetic realizations of a particular character or the set of characters associated with a particular syllable. The content and structure of the corpus, and the overall architecture as well as the technical aspects of the search engine are described. Search procedures are illustrated with examples. The paper ends with a discussion of the future development of HKCAC.

Full Text