Abstract

With the increase on the demands for code-switching automatic speech recognition (ASR), the design and development of a code-switching speech database becomes highly desirable. However, it is not easy to collect sufficient code-switched utterances for model training for code-switching ASR. This study presents the procedure and experience for the design and development of a Chinese-English COde-switching Speech database (CECOS). Two different methods for collecting Chinese-English code-switched utterances are employed in this work. The applications of the collected database are also introduced. The CECOS database not only contains the speech data with code-switch properties but also accents due to non-native speakers. This database can be applied to several applications, such as code-switching speech recognition, language identification, named entity detection, etc.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call