Abstract

We have recently proposed a universal acoustic characterisation to foreign accent recognition, in which any spoken foreign accent was described in terms of a common set of fundamental speech attributes. Although experimental evidence demonstrated the feasibility of our approach, we belive that speech attributes, namely manner and place of articulation, can be better modelled by a deep neural network. In this work, we propose the use of deep neural network trained on telephone bandwidth material from different languages to improve the proposed universal acoustic characterisation. We demonstrate that deeper neural architectures enhance the attribute classification accuracy. Furthermore, we show that improvements in attribute classification carry over to foreign accent recognition by producing a 21% relative improvement over previous baseline on spoken Finnish, and a 5.8% relative improvement on spoken English. Index Terms: Deep neural networks, data-driven speech attributes, manner of articulation, place of articulation, i-Vector, foreign accent recognition

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call