Idhazhi: A Min-Max Algorithm for Viseme to Phoneme Mapping

Suriyah M.*, ,Aarthy Anandan,Madhan Karky Vairamuthu

doi:10.35940/ijitee.d1414.029420

Abstract

With the advent of language tools, the barrier of language is diffusing fast. Content from one language is converted to other languages in different platforms especially in the entertainment industry. In this age, it is essential to increase the naturalness of syncing video and audio content while dubbing. It is a proven fact that a majority of people around the world prefer listening to audio content rather than reading in the language known to them. Detecting visemes sequences of words of a language in videos and identifying words from another language matching the detected viseme sequence becomes a valuable application in this scenario. In this paper we propose Idhazhi, an algorithm which suggests words as phonemes that match a particular viseme sequence, in four levels – perfect, optimized, semi-perfect and compacted. This is done by mapping specific oral positions of lips, teeth and tongue for the symbols of the IPA. The system was tested by recording 50 videos of testers pronouncing a word in each, playing it muted to a group 12 of testers who evaluated how much the words suggested by the system are relevant to the viseme sequence in the video; the accuracy was 0.73 after approximations. Apart from the suggested application, this algorithm can have wide application in security for finding out the list of words which may match a viseme sequence in a video like CCTV footage. It may also be extended to help persons with vocal disability by generating speech from their vocal movements.

Full Text