Abstract

The goal of this project is to determine the accuracy and processing speed of different approaches for mapping time‐varying articulatory positional data to vowels. Three widely used classifiers were compared on two datasets: one single speaker and one multiple speaker. The single‐speaker dataset was acquired using the Articulograph AG500. The multiple‐speaker dataset was obtained from seven speakers in the Xray Microbeam Speech Production Database (Westbury, 1994). The recognition rate for single speaker dataset (eight English vowels) ranged from 94.25% to 98.1%, and from 62.38% to 99.35% for the multiple‐speaker dataset. For the single‐speaker dataset, recognition accuracy was comparable across classifiers. For the multiple‐speaker dataset, recognition accuracy was better for the Support Vector Machine and C4.5 than for the neural networks. The decision tree generated by C4.5 was consistent with articulatory features commonly used to descriptively distinguish vowels. Moreover, the Support Vector Machine and C4.5 performed much faster than did the neural network. The high recognition rates observed suggest that static vowels can be accurately recognized from articulatory position time‐series data. The findings are intended to improve the accuracy and response time of a real‐time articulatory‐movement based speech synthesizer.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.