Optimising speech recognition using LLMs: an application in the surgical domain

Nevin M Matasyoh,Franziska Mathis-Ullrich,Ramy A Zeineldin

doi:10.1515/cdbme-2024-0112

Abstract

Abstract Automatic speech recognition (ASR), powered by deep learning techniques, is crucial for enhancing humancomputer interaction. However, its full potential remains unrealized in diverse real-world environments, with challenges such as dialects, accents, and domain-specific jargon, particularly in fields like surgery, persisting. Here, we investigate the potential of large language models (LLMs) as error correction modules for ASR.We leverage Whisper-medium or ASRLibriSpeech for speech recognition, and GPT-3.5 or GPT-4 for error correction.We employ various prompting methods, from zero-shot to few-shot with leading questions and sample medical terms to correct wrong transcriptions. Results, measured by word error rate (WER), reveal Whisper’s superior transcription accuracy over ASR-LibriSpeech, with a WER of 11.93% compared to 32.09%. GPT-3.5, with the few-shot with medical terms prompting method, further enhances performance, achieving a 64.29% and 37.83% WER-reduction for Whisper and ASR-LibriSpeech, respectively. Additionally, Whisper exhibits faster execution speed. Substituting GPT-3.5 with GPT- 4 further enhances transcription accuracy. Despite some few challenges, our approach demonstrates the potential of leveraging domain-specific knowledge through LLM prompting for accurate transcription, particularly in sophisticated domains like surgery.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Optimising speech recognition using LLMs: an application in the surgical domain

Abstract

Talk to us

Similar Papers

More From: Current Directions in Biomedical Engineering

Lead the way for us

Journal: Current Directions in Biomedical Engineering	Publication Date: Sep 1, 2024
License type: CC BY 4.0

Similar Papers

The IBM Rich Transcription 2007 Speech-to-Text Systems for Lecture Meetings
Jing Huang ... Gerasimos Potamianos
-
Jing Huang, et. al.Jing Huang ... Gerasimos Potamianos
01 Jan 2008
The IBM Rich Transcription 2007 Speech-to-Text Systems for Lecture Meetings
Jing Huang ... Gerasimos Potamianos

Contextualized Speech Recognition: Rethinking Second-Pass Rescoring with Generative Large Language Models
Yixuan Tang ... Anthony K H Tung
-
Yixuan Tang, et. al.Yixuan Tang ... Anthony K H Tung
01 Aug 2024
01 Aug 2024

Theoretical Analysis of Diversity in an Ensemble of Automatic Speech Recognition Systems
Kartik Audhkhasi ... Shrikanth S Narayanan
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 22
Kartik Audhkhasi, et. al.Kartik Audhkhasi ... Shrikanth S Narayanan
01 Mar 2014
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 22

Transformer Based Grapheme-to-Phoneme Conversion
Sevinj Yolchuyeva ... Géza Németh
-
Sevinj Yolchuyeva, et. al.Sevinj Yolchuyeva ... Géza Németh
15 Sep 2019
15 Sep 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Optimising speech recognition using LLMs: an application in the surgical domain

Abstract

Talk to us

Similar Papers

More From: Current Directions in Biomedical Engineering