Abstract

Speech synchronization is one of the studies in the field of facial animation that has been widely studied, which results in speech animation, but there are still many challenges that have not been reached at this time, one of which is realistic speech synchronization. Because of differences in visual phoneme (viseme) in the pronunciation of each language, it is very difficult to make speech synchronization tools that are applicable for all languages, and at present there are no speech synchronization tools that can provide good results for Indonesian Language. This study proposes the introduction of co-articulation in speech synchronization to produce a more realistic animation, and viseme mapping based on the consonant-vowel (CV) syllable pattern in the Indonesian language, resulting in a more specific viseme group, so it supports the development of realistic speech synchronization, next called as Bahasa Speech Sync. Co-articulation calculation is done using Kochanek-Bartels spline interpolation approach which adds tension, bias and continuity parameters, using the 4 control points taken from real human videos, to accommodate the concept of co-articulation. Viseme mapping is done by comparing the difference in distance between the 12 crucial points with a point of reference for each syllable. Based on the results of our proposed viseme grouping procedure, we have simplified the viseme generation from a combination of 21 consonants and 5 vowels into 24 groups of viseme, 18 of which represent the start position while 6 groups represent the and position. Test result of the similarity of the movement between generated animation and real human videos has achieved 89% “realistic” perception based on our proposed distance criteria.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call