Abstract

This paper presents a real-time speech-driven talking face system which provides low computational complexity and smoothly visual sense. A novel embedded confusable system is proposed to generate an efficient phoneme-viseme mapping table which is constructed by phoneme grouping using Houtgast similarity approach based on the results of viseme similarity estimation using histogram distance, according to the concept of viseme visually ambiguous. The generated mapping table can simplify the mapping problem and promote viseme classification accuracy. The implemented real time speech-driven talking face system includes: 1) speech signal processing, including SNR-aware speech enhancement for noise reduction and ICA-based feature set extractions for robust acoustic feature vectors; 2) recognition network processing, HMM and MCSVM are combined as a recognition network approach for phoneme recognition and viseme classification, which HMM is good at dealing with sequential inputs, while MCSVM shows superior performance in classifying with good generalization properties, especially for limited samples. The phoneme-viseme mapping table is used for MCSVM to classify the observation sequence of HMM results, which the viseme class is belong to; 3) visual processing, arranges lip shape image of visemes in time sequence, and presents more authenticity using a dynamic alpha blending with different alpha value settings. Presented by the experiments, the used speech signal processing with noise speech comparing with clean speech, could gain 1.1 % (16.7 % to 15.6 %) and 4.8 % (30.4 % to 35.2 %) accuracy rate improvements in PER and WER, respectively. For viseme classification, the error rate is decreased from 19.22 % to 9.37 %. Last, we simulated a GSM communication between mobile phone and PC for visual quality rating and speech driven feeling using mean opinion score. Therefore, our method reduces the number of visemes and lip shape images by confusable sets and enables real-time operation.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.