Abstract

An adult language corpus of spoken Hong Kong Cantonese (HKCAC) has recently been developed consisting of spontaneous speech recorded from phone-in programs and forums on the radio in Hong Kong. The database represents the speech of a total of sixty-nine speakers in addition to the program hosts, and has approximately 170,000 characters. It is believed that HKCAC will be of great value to linguists who are interested in studying Cantonese, and speech therapists and educators who work with the Cantonese speaking population. A search engine with a user-friendly interface has also been developed by using FileMaker Pro 4.0 (Chinese version). Apart from the basic frequency information and the display of search results in KWAL (Key Word And Line) format, the search engine also allows users to search for various phonetic realizations of a particular character or the set of characters associated with a particular syllable. The content and structure of the corpus, and the overall architecture as well as the technical aspects of the search engine are described. Search procedures are illustrated with examples. The paper ends with a discussion of the future development of HKCAC.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.