Abstract

A 2 year-old has approximately heard a 1000 h of speech—at the age of ten, around ten thousand. Similarly, automatic speech recognisers are often trained on data in these dimensions. In stark contrast, however, only few databases to train a speaker analysis system contain more than 10 h of speech and hardly ever more than 100 h. Yet, these systems are ideally expected to recognise the states and traits of speakers independent of the person, spoken content, language, cultural background, and acoustic disturbances best at human parity or even superhuman levels. While this is not reached at the time for many tasks such as speaker emotion recognition, deep learning—often described to lead to significant improvements—in combination with sufficient learning data, holds the promise to reach this goal. Luckily, every second, more than 5 h of video are uploaded to the web and several hundreds of hours of audio and video communication in most languages of the world take place. A major effort could thus be invested in efficient labelling and sharing of these. In this contribution, first, benchmarks are given from the nine research challenges co-organised by the authors over the years at the annual Interspeech conference since 2009. Then, approaches to utmost efficient exploitation of the ‘big’ (unlabelled) data available are presented. Small-world modelling in combination with unsupervised learning help to rapidly identify potential target data of interest. Further, gamified crowdsourcing combined with human-machine cooperative learning turns the annotation process into an entertaining experience, while reducing the manual labelling effort to a minimum. Moreover, increasingly autonomous deep holistic end-to-end learning solutions are presented for the tasks at hand. The concluding discussion will contain some crystal ball gazing alongside practical hints not missing out on ethical aspects.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.