Abstract
We propose an information theoretic region selection algorithm from the real time magnetic resonance imaging (rtMRI) video frames for a broad phonetic class recognition task. Representations derived from these optimal regions are used as the articulatory features for recognition. A set of connected and arbitrary shaped regions are selected such that the articulatory features computed from such regions provide maximal information about the broad phonetic classes. We also propose a tree-structured greedy region splitting algorithm to further segment these regions so that articulatory features from these split regions enhance the information about the phonetic classes. We find that some of the proposed articulatory features correlate well with the articulatory gestures from the Articulatory Phonology theory of speech production. Broad phonetic class recognition experiment using four rtMRI subjects reveals that the recognition accuracy with optimal split regions is, on average, higher than that using only acoustic features. Combining acoustic and articulatory features further reduces the error-rate by ∼8.25% (relative).
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.