Abstract

This project investigates the potential of integrating Blind Source Separation (DUET algorithm) and Automatic Speech Recognition (Wav2Vec2 model) for real-time, accurate transcription in multi-speaker scenarios. Specifically targeted towards improving accessibility for individuals with hearing impairments, the project addresses the challenging task of separating and transcribing speech from simultaneous speakers in various contexts. The DUET algorithm effectively separates individual voices from complex audio scenarios, which are then accurately transcribed into text by the machine learning model, Wav2Vec2. However, despite their remarkable capabilities, both techniques present limitations, particularly when handling complicated audio scenarios and in terms of computational efficiency. Looking ahead, the research suggests incorporating a feedback mechanism between the two systems as a potential solution for these issues. This innovative mechanism could contribute to a more accurate and efficient separation and transcription process by enabling the systems to dynamically adjust to each other's outputs. Nevertheless, this promising direction also brings with it new challenges, particularly in terms of system complexity, defining actionable feedback parameters, and maintaining system efficiency in real-time applications.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.