Abstract

In dialect speech recognition, the feature of tone in one dialect is subject to changes in pitch frequency as well as the length of tone. It is beneficial for the recognition if a representation can be derived to account for the frequency and length changes of tone in an effective and meaningful way. In this paper, we propose a method for learning such a representation from a set of unlabeled speech sentences containing the features of the dialect changed from various pitch frequencies and time length. Topographic independent component analysis (TICA) is applied for the unsupervised learning to produce an emergent result that is a topographic matrix made up of basis components. The dialect speech is topographic in the following sense: the basis components as the units of the speech are ordered in the feature matrix such that components of one dialect are grouped in one axis and changes in time windows are accounted for in the other axis. This provides a meaningful set of basis vectors that may be used to construct dialect subspaces for dialect speech recognition.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call