Lip Synchronization Modeling for Sinhala Speech

Chashika Weerathunga,Damitha Sandaruwan,Ruvan Weerasinghe

doi:10.1109/icter51097.2020.9325489

Abstract

Lip synchronization, also known as visual speech animation, is the process of matching the speech with the lip movements. Visual speech animation has a great impact on the gaming and animation film industry, due to the reason that it provides a realistic experience to the users. Furthermore, this technology also supports better communication for deaf people.For most of the European languages, lip synchronizing models have been developed and used vastly in the entertainment industries. However, there are still no research experiments conducted towards the speech animation of the Sinhala language. Less contribution towards research development and unavailability of resources have been the issues for this.This research is focused on the problem of achieving a lip synchronization model for the Sinhala language. The project presents a study on how to map from acoustic speech to visual speech with the goal of generating perceptually natural speech animation. The experiments on developing a visemes alphabet is carried out using a static visemes approach on a video data set created by the author. The implemented lip synchronizing model was evaluated using a subjective evaluation based on six different categories. The generated model using the static visemes approach achieved 68.8% rating accuracy and a 70.8% ranking accuracy. This model performs well for individual words and short sentences rather than long sentences and sentences that are uttered with different speed levels.

Full Text