Abstract

The abundance of openly available audio data in English enables pretraining of automatic speech recognition (ASR) systems on hundreds of thousands to millions of hours of recorded speech. As a result speech recognition systems are approaching human level robustness in English. Other languages’ performance in multilingual speech recognition systems tend to stand in proportion to the amount of data included from the language – or the language family – in question. For low to mid resource languages with fewer speakers, the amount of openly available data may be limited, and as a consequence these languages tend to be underrepresented in large scale efforts to train multilingual speech recognition systems. The National Library of Sweden hosts large collections of audio recordings. These resources allow us to potentially bridge some of the existing speech recognition performance gaps between Swedish and higher resource languages. We believe Swedish speech recognition can be further improved upon by scaling up the amount of training data. We have constructed an inclusive and transparent speech corpus with emphasis on all variations of spoken Swedish that we will use to train speech-to-text models for Swedish.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.