Identification of speechless intervals in audio tracks using convolutional neural networks

Vinícius Wanderley,Virgínia Pinto Campos,Leonardo Villeth,Thaís Gaudencio,Tiago Maritan

doi:10.1145/3323503.3360299

Abstract

Audio description (AD) is an accessibility resource designed to improve access for blind or low vision individuals by describing images, narrating actions and visual elements, such as scene details, some aspects of the character (eg, age, gender, clothing), among others. However, in general, an AD is only generated in sections of the video that do not contain dialogue. This is done to prevent any overlap with the dialogue in the video, which may hinder the user's understanding rather than helping it. Thus, one of the first steps in the AD generation process is to identify the speechless intervals, which are candidates to receive AD. In this work, we present a solution for automatic identification of speechless intervals in digital videos using Convolutional Neural Networks (CNNs). Our proposal is to automate this step in the AD generation process, reducing the time and effort involved for generating AD. Another alternative would be to integrate it into an automatic or semi-automatic audio description generation system. The results shows that, considering a minimum confidence level of 0.5 for the output of the classification model, the solution obtained a balanced average accuracy of 93% to identify speechless segments considering all the videos tested.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Identification of speechless intervals in audio tracks using convolutional neural networks

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Indefinitely beyond Our Reach: The Case for Elevating Audio Description to the Importance of Captions on Australian Television
Katie M Ellis ... Kathryn Locke
M/C Journal | VOL. 20
Katie M Ellis, et. al.Katie M Ellis ... Kathryn Locke
21 Jun 2017
M/C Journal | VOL. 20

A new approach to creating and deploying audio description for live theater
Dirk Vander Wilt ... Morwaread Mary Farbood
Personal and Ubiquitous Computing | VOL. 25
Dirk Vander Wilt, et. al.Dirk Vander Wilt ... Morwaread Mary Farbood
08 May 2020
Personal and Ubiquitous Computing | VOL. 25

Overview of the audio description in spanish DTT channels
...
-
, et. al. ...
21 Dec 2015
21 Dec 2015

Rescribe
Amy Pavel ... Jeffrey P Bigham
-
Amy Pavel, et. al.Amy Pavel ... Jeffrey P Bigham
20 Oct 2020
20 Oct 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Identification of speechless intervals in audio tracks using convolutional neural networks

Abstract

Talk to us

Similar Papers