Abstract

Audio description (AD) is an accessibility resource designed to improve access for blind or low vision individuals by describing images, narrating actions and visual elements, such as scene details, some aspects of the character (eg, age, gender, clothing), among others. However, in general, an AD is only generated in sections of the video that do not contain dialogue. This is done to prevent any overlap with the dialogue in the video, which may hinder the user's understanding rather than helping it. Thus, one of the first steps in the AD generation process is to identify the speechless intervals, which are candidates to receive AD. In this work, we present a solution for automatic identification of speechless intervals in digital videos using Convolutional Neural Networks (CNNs). Our proposal is to automate this step in the AD generation process, reducing the time and effort involved for generating AD. Another alternative would be to integrate it into an automatic or semi-automatic audio description generation system. The results shows that, considering a minimum confidence level of 0.5 for the output of the classification model, the solution obtained a balanced average accuracy of 93% to identify speechless segments considering all the videos tested.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.