Abstract

Active Speaker Detection is pivotal in a multitude of applications, particularly in processing live Audio-Video (AV) streams. Current implementations predominantly focus on processing saved video files, limiting their real-time applicability. Addressing this gap, the proposed model leverages a multi-threading-based system to detect active speakers in live AV streams. This system forms a critical component in an innovative software solution designed to generate real-time subtitles and elegantly overlay them aside from the active speaker. This feature is especially beneficial for individuals with hearing impairments and facilitates the transcription of foreign languages into English, thereby improving human interaction and understanding. Our approach stands out for its ability to process live AV streams promptly for immediate speaker identification and subtitle overlay, marking a significant advancement in real-time communication assistance.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call