Abstract

In spite of the rapid development of speech techniques, most of the present achievements are for a few major languages, e.g., English and Chinese. Unfortunately, most of the languages in the world are 'minority languages', in the sense that they are spoken by a small population and with limited resource accumulation. Since the present speech technologies are mostly based on big data, partly due to the profound impact of deep learning, they are not directly applicable to minority languages. However, minority languages are so numerous and important that if we want to break the language barrier, they must be seriously taken into account. Recently, the Chinese government approved a fundamental research for minority languages in China: Multilingual Minorlingual Automatic Speech Recognition (M2ASR). Although the initial goal was speech recognition, the ambition of this project is more than that: it intends to publish all the achievements and make them free for the research community, including speech and text corpora, phone sets, lexicons, tools, recipes and prototype systems. In this paper, we will describe this project, report the first-year progress, and present the future plan.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.