Abstract

This chapter provides an overview of available language resources, from both U.S. and European perspectives. Multilingual data repositories as well as large ongoing and planned collection efforts are introduced, along with a description of the major challenges of collection efforts, such as transcription issues due to inconsistent writing standards, subject recruitment, recording equipment, legal aspects, and costs in terms of time and money. The overview of multilingual resources comprises multilingual audio and text data, pronunciation dictionaries, and parallel bilingual/multilingual corpora. This chapter provides an overview of existing language resources in Europe. A number of projects in Europe have been working toward the production of multilingual speech and language resources, many of which have become key databases for the human language technology (HLT) community. The SpeechDat projects are a set of speech data-collection efforts funded by the European Commission with the aim of establishing databases for the development of voice-operated teleservices and speech interfaces. The resulting databases are available via European Language Resources Association (ELRA).

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.