Real-Time Speaker Identification and Subtitle Overlay with Multithreaded Audio Video Processing

Sahith Madamanchi,Gona Kushal,Srikesh Ravikumar,Puli Dhanvin,Remya M S,Prema Nedungadi

doi:10.1016/j.procs.2024.03.262

Abstract

Active Speaker Detection is pivotal in a multitude of applications, particularly in processing live Audio-Video (AV) streams. Current implementations predominantly focus on processing saved video files, limiting their real-time applicability. Addressing this gap, the proposed model leverages a multi-threading-based system to detect active speakers in live AV streams. This system forms a critical component in an innovative software solution designed to generate real-time subtitles and elegantly overlay them aside from the active speaker. This feature is especially beneficial for individuals with hearing impairments and facilitates the transcription of foreign languages into English, thereby improving human interaction and understanding. Our approach stands out for its ability to process live AV streams promptly for immediate speaker identification and subtitle overlay, marking a significant advancement in real-time communication assistance.

Full Text