DEEP-HEAR: A Multimodal Subtitle Positioning System Dedicated to Deaf and Hearing-Impaired People

Ruxandra Tapu,Bogdan Mocanu,Titus Zaharia

doi:10.1109/access.2019.2925806

Ruxandra Tapu, Bogdan Mocanu + Show 1 more

Open Access

PDF Available

https://doi.org/10.1109/access.2019.2925806

Copy DOI

Export

Save

Cite

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

In this paper, we introduce the DEEP-HEAR framework, a multimodal dynamic subtitle positioning system designed to increase the accessibility of deaf and hearing impaired people (HIP) to multimedia documents. The proposed system exploits both computer vision algorithms and deep convolutional neural networks specifically designed and tuned in order to detect and recognize the identity of the active speaker. The main contributions of the paper concern: a novel method dedicated to recognizing various characters existent in the video stream. A video temporal segmentation algorithm that divides the video sequence into semantic units, based on face tracks and visual consistency. Finally, the core of our approach concerns a novel active speaker recognition method relying on the multimodal information fusion from the text, audio, and video streams. The experimental results carried out on a large scale dataset of more than 30 videos, validate the proposed methodology with average accuracy and recognition rates superior to 90%. Moreover, the method shows robustness to important object/camera motion and face pose variation, yielding gains of more than 8% in precision and recall rates when compared with state-of-the-art techniques. The subjective evaluation of the proposed dynamic subtitle positioning system demonstrates the effectiveness of our approach.

Highlights

The recent statistics published by the World Health Organization [1] show that for people aged over 50 years the hearing impairments become progressively common in the world
In order to facilitate the access to information and fit the needs of people with hearing disabilities, most of the TV broadcasters transmit and distribute, together with the audio and video signals, textual information, which is presented under the form of video subtitles or close captions
In order to evaluate the influence of each component of our system over the speaker recognition performances, we have considered for comparison: (1) An active speaker recognition strategy based solely on the face recognition module

Summary

INTRODUCTION

The recent statistics published by the World Health Organization [1] show that for people aged over 50 years the hearing impairments become progressively common in the world. In contrast with existing systems, where the close caption are always positioned in a fixed position at the bottom of the screen, our approach helps the hearing impaired users to match the scripts with the corresponding characters, by positioning the subtitles in a manner that makes it possible to identify the active speaker. The DEEP-HEAR framework (Fig. 1), jointly exploits computer vision algorithms and deep convolutional neural networks (CNNs) in order to achieve the various stages necessary to this purpose, including face detection, tracking and recognition, video temporal segmentation, active speaker detection and recognition, background text detection and subtitle positioning.

RELATED WORK

ACTIVE SPEAKER DETECTION

SUBTITLE POSITIONING

EXPERIMENTAL EVALUATION

OBJECTIVE EVALUATION

Findings

CONCLUSION AND PERSPECTIVES

Full Text

Published Version (Free)

View/Download pdf

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Access	Publication Date: Jan 1, 2019
Citations: 11	License type: CC BY 4.0

R Discovery Prime

DEEP-HEAR: A Multimodal Subtitle Positioning System Dedicated to Deaf and Hearing-Impaired People

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

Dynamic Subtitles: A Multimodal Video Accessibility Enhancement Dedicated to Deaf and Hearing Impaired Users
Ruxandra Tapu ... Bogdan Mocanu
-
Ruxandra Tapu, et. al.Ruxandra Tapu ... Bogdan Mocanu
01 Oct 2019
01 Oct 2019

Automatic Subtitle Placement Through Active Speaker Identification in Multimedia Documents
Bogdan Mocanu ... Ruxandra Tapu
-
Bogdan Mocanu, et. al.Bogdan Mocanu ... Ruxandra Tapu
18 Nov 2021
18 Nov 2021

Hydrodynamic Forces of DP Jack-Up Leg when Operating in Vicinity of Seabed
Nitin D Thulkar ... Satoru Yamaguchi
Journal of Ship Research | VOL. 67
Nitin D Thulkar, et. al.Nitin D Thulkar ... Satoru Yamaguchi
14 Oct 2021
Journal of Ship Research | VOL. 67

Development and Validation of a Customizable DP System for a Full Bridge Real Time Simulator
... Eduardo A Tannuri
-
, et. al. ... Eduardo A Tannuri
08 Jun 2014
08 Jun 2014

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

DEEP-HEAR: A Multimodal Subtitle Positioning System Dedicated to Deaf and Hearing-Impaired People

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: IEEE Access