Abstract

Automatic Speech Recognition (ASR) technology has seen major advancements in its accuracy and speed in recent years, making it a possible mechanism for supporting communication between people who are Deaf or Hard-of-Hearing (DHH) and their hearing peers. However, state-of-the-art ASR technology is still imperfect in many realistic settings. Researchers who evaluate ASR performance often focus on improving the Word Error Rate (WER) metric, but it has been found to have little correlation with human-subject performance for many applications. This article describes and evaluates several new captioning-focused evaluation metrics for predicting the impact of ASR errors on the understandability of automatically generated captions for people who are DHH. Through experimental studies with DHH users, we have found that our new metric (based on word-importance and semantic-difference scoring) is more closely correlated with DHH user's judgements of caption quality—as compared to pre-existing metrics for ASR evaluation.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call