Abstract

Active Speaker Detection is pivotal in a multitude of applications, particularly in processing live Audio-Video (AV) streams. Current implementations predominantly focus on processing saved video files, limiting their real-time applicability. Addressing this gap, the proposed model leverages a multi-threading-based system to detect active speakers in live AV streams. This system forms a critical component in an innovative software solution designed to generate real-time subtitles and elegantly overlay them aside from the active speaker. This feature is especially beneficial for individuals with hearing impairments and facilitates the transcription of foreign languages into English, thereby improving human interaction and understanding. Our approach stands out for its ability to process live AV streams promptly for immediate speaker identification and subtitle overlay, marking a significant advancement in real-time communication assistance.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.