Voice across Hispanic America: a telephone speech corpus of American Spanish

Y Muthusamy,J Picone,J Godfrey,B Wheatley,E Holliman

doi:10.1109/icassp.1995.479279

Abstract

As part of the Polyphone project, Texas Instruments is in the process of collecting and developing a corpus of telephone speech in American Spanish. The corpus, called Voice Across Hispanic America (VAHA), will attempt to provide balanced phonetic coverage of the language, in addition to containing widely used vocabulary items such as digits, letter strings, yes/no responses, proper names, and selected command words and phrases used in automated telephone service applications. The speakers are native speakers of Spanish living in the United States. The collection and development of the corpus is expected to be completed by June 1995. So far, the authors have collected about 500 speakers from various parts of the U.S. They describe the design issues in various aspects of the project, such as subject recruitment, corpus and prompt sheet design, the data acquisition system, and validation and transcription. They conclude with a brief statistical profile of the data collected.

Full Text