Generating dynamic lip-syncing using target audio in a multimedia environment

Diksha Pawar,Prashant Borde,Pravin Yannawar

doi:10.1016/j.nlp.2024.100084

Abstract

The presented research focuses on the challenging task of creating lip-sync facial videos that align with a specified target speech segment. A novel deep-learning model has been developed to produce precise synthetic lip movements corresponding to the speech extracted from an audio source. Consequently, there are instances where portions of the visual data may fall out of sync with the updated audio and this challenge is handled through, a novel strategy, leveraging insights from a robust lip-sync discriminator. Additionally, this study introduces fresh criteria and evaluation benchmarks for assessing lip synchronization in unconstrained videos. LipChanger demonstrates improved PSNR values, indicative of enhanced image quality. Furthermore, it exhibits highly accurate lip synthesis, as evidenced by lower LMD values and higher SSIM values. These outcomes suggest that the LipChanger approach holds significant potential for enhancing lip synchronization in talking face videos, resulting in more realistic lip movements. The proposed LipChanger model and its associated evaluation benchmarks show promise and could potentially contribute to advancements in lip-sync technology for unconstrained talking face videos.

Full Text

Published Version

View

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

Generating dynamic lip-syncing using target audio in a multimedia environment

Abstract

Published Version

Talk to us

Similar Papers

More From: Natural Language Processing Journal

Lead the way for us

Journal: Natural Language Processing Journal	Publication Date: Jun 10, 2024
License type: cc-by-nc-nd

Similar Papers

A Lip Sync Expert Is All You Need for Speech to Lip Generation In the Wild
K R Prajwal ... C.V Jawahar
-
K R Prajwal, et. al.K R Prajwal ... C.V Jawahar
12 Oct 2020
12 Oct 2020

Talking Face Generation by Conditional Recurrent Adversarial Network
Yang Song ... Dawei Li
-
Yang Song, et. al.Yang Song ... Dawei Li
01 Aug 2019
01 Aug 2019

Visual dubbing pipeline with localized lip-sync and two-pass identity transfer
Dhyey Patel ... Tiberiu Popa
Computers & Graphics | VOL. 110
Dhyey Patel, et. al.Dhyey Patel ... Tiberiu Popa
17 Nov 2022
Computers & Graphics | VOL. 110

Lip event detection using oriented histograms of regional optical flow and low rank affinity pursuit
Xin Liu ... Yuan Yan Tang
Computer Vision and Image Understanding | VOL. 148
Xin Liu, et. al.Xin Liu ... Yuan Yan Tang
27 May 2016
Computer Vision and Image Understanding | VOL. 148

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

Generating dynamic lip-syncing using target audio in a multimedia environment

Abstract

Published Version

Talk to us

Similar Papers

More From: Natural Language Processing Journal