Exploring Visual Scanning in Augmented Reality: Perspectives From Deaf and Hard of Hearing Users
Sensory-intensive and attention-demanding tasks like visual scanning, interacting with 3D objects, comprehending and following instructions, etc. are becoming more common in Augmented Reality (AR) environments as the technology expands through diverse fields. It is important to understand how these types of tasks are experienced by Deaf and Hard of Hearing (DHH) people, especially if those tasks involve any sound or compete with attention shifts (e.g., observing someone signing) in both real and virtual environments. Our current research specifically aims to identify the challenges that DHH users encounter when engaging in visual scanning in an AR environment. Using Angry Birds AR as a probe in our research, 11 DHH participants, with varying hearing abilities played seven rounds of the game, followed by a short structured interview and a long semi-structured interview. Our findings revealed that subtle audio cues and excessive visual indicators impacted participants’ performances negatively. Additionally, they positioned themselves strategically for maximum spatial awareness but faced challenges with AR visual cues due to the lighting conditions in the real environment. We further suggested design implications such as customizable, user-friendly haptic and textual feedback, and intelligent spatially aware mechanisms for AR.
- Conference Article
14
- 10.1145/3405755.3406158
- Jul 22, 2020
With the proliferation of voice-based conversational user interfaces (CUIs) comes accessibility barriers for Deaf and Hard of Hearing (DHH) users. There has not been significant prior research on sign-language conversational interactions with technology. In this paper, we motivate research on this topic and identify open questions and challenges in this space, including DHH users' interests in this technology, the types of commands they may use, and the open design questions in how to structure the conversational interaction in this sign-language modality. We also describe our current research methods for addressing these questions, including how we engage with the DHH community
- Conference Article
19
- 10.1109/dexa.2009.92
- Jan 1, 2009
In this paper, we introduce a sign language interpreter module (SLIM), which delivers transparent sign language videos to deaf and hard of hearing users. Since their first language is the sign language, they rely on the visual modality with some speech input. Therefore in addition to text and images, a video of sign language interpreter should be provided. The SLIM system uses layers for exposing videos over existing Web pages, which preserves the layout structure. Our evaluation study has shown that such a system is highly acceptable by deaf and hard of hearing users. Therefore our proposal is to enhance the Web content accessibility guidelines, by adding an additional multimodal aspect for presenting existing Web information with transparent videos for deaf and hard of hearing users.
- Conference Article
37
- 10.1145/3373625.3418031
- Oct 26, 2020
Head-mounted displays can provide private and glanceable speech and sound feedback to deaf and hard of hearing people, yet prior systems have largely focused on speech transcription. We introduce HoloSound, a HoloLens-based augmented reality (AR) prototype that uses deep learning to classify and visualize sound identity and location in addition to providing speech transcription. This poster paper presents a working proof-of-concept prototype, and discusses future opportunities for advancing AR-based sound awareness.
- Conference Article
33
- 10.1145/3234695.3236343
- Oct 8, 2018
Mobile, wearable, and other ubiquitous computing devices are increasingly creating a context in which conventional keyboard and screen-based inputs are being replaced in favor of more natural speech-based interactions. Digital personal assistants use speech to control a wide range of functionality, from environmental controls to information access. However, many deaf and hard-of-hearing users have speech patterns that vary from those of hearing users due to incomplete acoustic feedback from their own voices. Because automatic speech recognition (ASR) systems are largely trained using speech from hearing individuals, speech-controlled technologies are typically inaccessible to deaf users. Prior work has focused on providing deaf users access to aural output via real-time captioning or signing, but little has been done to improve users' ability to provide input to these systems' speech-based interfaces. Further, the vocalization patterns of deaf speech often make accurate recognition intractable for both automated systems and human listeners, making traditional approaches to mitigate ASR limitations, such as human captionists, less effective. To bridge this accessibility gap, we investigate the limitations of common speech recognition approaches and techniques---both automatic and human-powered---when applied to deaf speech. We then explore the effectiveness of an iterative crowdsourcing workflow, and characterize the potential for groups to collectively exceed the performance of individuals. This paper contributes a better understanding of the challenges of deaf speech recognition and provides insights for future system development in this space.
- Book Chapter
1
- 10.1007/978-3-030-78095-1_16
- Jan 1, 2021
While the availability of captioned television programming has increased, the quality of this captioning is not always acceptable to Deaf and Hard of Hearing (DHH) viewers, especially for live or unscripted content broadcast from local television stations. Although some current caption metrics focus on textual accuracy (comparing caption text with an accurate transcription of what was spoken), other properties may affect DHH viewers’ judgments of caption quality. In fact, U.S. regulatory guidance on caption quality standards includes issues relating to how the placement of captions may occlude other video content. To this end, we conducted an empirical study with 29 DHH participants to investigate the effect on user’s judgements of caption quality or their enjoyment of the video, when captions overlap with an onscreen speaker’s eyes or mouth, or when captions overlap with onscreen text. We observed significantly more negative user-response scores in the case of such overlap. Understanding the relationship between these occlusion features and DHH viewers’ judgments of the quality of captioned video will inform future work towards the creation caption evaluation metrics, to help ensure the accessibility of captioned television or video.KeywordsOcclusionStimuliCaptionMetric
- Conference Article
48
- 10.1145/3313831.3376758
- Apr 21, 2020
We introduce HomeSound, an in-home sound awareness system for Deaf and hard of hearing (DHH) users. Similar to the Echo Show or Nest Hub, HomeSound consists of a microphone and display, and uses multiple devices installed in each home. We iteratively developed two prototypes, both of which sense and visualize sound information in real-time. Prototype 1 provided a floorplan view of sound occurrences with waveform histories depicting loudness and pitch. A three-week deployment in four DHH homes showed an increase in participants' home- and self-awareness but also uncovered challenges due to lack of line of sight and sound classification. For Prototype 2, we added automatic sound classification and smartwatch support for wearable alerts. A second field deployment in four homes showed further increases in awareness but misclassifications and constant watch vibrations were not well received. We discuss findings related to awareness, privacy, and display placement and implications for future home sound awareness technology.
- Conference Article
9
- 10.1145/3597638.3608390
- Oct 22, 2023
Sound recognition tools have wide-ranging impacts for deaf and hard of hearing (DHH) people from being informed of safety-critical information (e.g., fire alarms, sirens) to more mundane but still useful information (e.g., door knock, microwave beeps). However, prior sound recognition systems use models that are pre-trained on generic sound datasets and do not adapt well to diverse variations of real-world sounds. We introduce AdaptiveSound, a real-time system for portable devices (e.g., smartphones) that allows DHH users to provide corrective feedback to the sound recognition model to adapt the model to diverse acoustic environments. AdaptiveSound is informed by prior surveys of sound recognition systems, where DHH users strongly desired the ability to provide feedback to a pre-trained sound recognition model to fine-tune it to their environments. Through quantitative experiments and field evaluations with 12 DHH users, we show that AdaptiveSound can achieve a significantly higher accuracy (+14.6%) than prior state-of-the art systems in diverse real-world locations (e.g., homes, parks, streets, and malls) with little end-user effort (about 10 minutes of feedback).
- Book Chapter
- 10.1201/b11963-ch-40
- May 4, 2012
This chapter discusses interface technologies as they relate to the needs of users unable to hear auditory information. It deals with a discussion of hearing loss, followed by issues of language acquisition as they relate to hearing loss and provides a brief overview of technologies that have been developed to assist with communication. The extent to which an individual makes use of the hearing for communication and whether the individual can hear computer sounds, however, varies greatly. Perhaps more surprising may be the fact that many deaf and hard of hearing individuals have difficulty with reading. In many ways, computers and other technologies have proven to be of great benefit to deaf and hard of hearing users. The largely visual nature of information on the Internet makes this information accessible to deaf and hard of hearing users. A number of considerations can help provide the necessary visual support for a user who is deaf or hard of hearing.
- Conference Article
3
- 10.18653/v1/2022.ltedi-1.5
- Jan 1, 2022
Deaf and hard of hearing individuals regularly rely on captioning while watching live TV. Live TV captioning is evaluated by regulatory agencies using various caption evaluation metrics. However, caption evaluation metrics are often not informed by preferences of DHH users or how meaningful the captions are. There is a need to construct caption evaluation metrics that take the relative importance of words in a transcript into account. We conducted correlation analysis between two types of word embeddings and human-annotated labeled word-importance scores in existing corpus. We found that normalized contextualized word embeddings generated using BERT correlated better with manually annotated importance scores than word2vec-based word embeddings. We make available a pairing of word embeddings and their human-annotated importance scores. We also provide proof-of-concept utility by training word importance models, achieving an F1-score of 0.57 in the 6-class word importance classification task.
- Book Chapter
1
- 10.1201/9781420088885.ch8
- Mar 2, 2009
Computing Technologies for Deaf and Hard of Hearing Users
- Conference Article
12
- 10.1145/3544549.3585880
- Apr 19, 2023
Caption text conveys salient auditory information to deaf or hard-of-hearing (DHH) viewers. However, the emotional information within the speech is not captured. We developed three emotive captioning schemas that map the output of audio-based emotion detection models to expressive caption text that can convey underlying emotions. The three schemas used typographic changes to the text, color changes, or both. Next, we designed a Unity framework to implement these schemas and used it to generate stimuli videos. In an experimental evaluation with 28 DHH viewers, we compared DHH viewers’ ability to understand emotions and their subjective judgments across the three captioning schemas. We found no significant difference in participants’ ability to understand the emotion based on the captions or their subjective preference ratings. Open-ended feedback revealed factors contributing to individual differences in preferences among the participants and challenges with automatically generated emotive captions that motivate future work.
- Book Chapter
7
- 10.1201/9781410615862.ch45
- Sep 19, 2007
Computing Technologies for Deaf and Hard of Hearing Users
- Conference Article
- 10.1109/ismar-adjunct68609.2025.00108
- Oct 8, 2025
Augmented reality (AR) has shown promise for supporting Deaf and hard-of-hearing (DHH) individuals by captioning speech and visualizing environmental sounds, yet existing systems do not allow users to create personalized sound visualizations. We present SonoCraftAR, a proof-of-concept prototype that empowers DHH users to author custom sound-reactive AR interfaces using typed natural language input. SonoCraftAR integrates real-time audio signal processing with a multi-agent LLM pipeline that procedurally generates animated 2D interfaces via a vector graphics library. The system extracts the dominant frequency of incoming audio and maps it to visual properties such as size and color, making the visualizations respond dynamically to sound. This early exploration demonstrates the feasibility of open-ended sound-reactive AR interface authoring and discusses future opportunities for personalized, AI-assisted tools to improve sound accessibility.
- Video Transcripts
- 10.48448/sp62-bg09
- May 12, 2022
- Underline Science Inc.
Deaf and hard of hearing individuals regularly rely on captioning while watching live TV. Live TV captioning is evaluated by regulatory agencies using various caption evaluation metrics. However, caption evaluation metrics are often not informed by preferences of DHH users or how meaningful the captions are. There is a need to construct caption evaluation metrics that take the relative importance of words in a transcript into account. We conducted correlation analysis between two types of word embeddings and human-annotated labeled word-importance scores in existing corpus. We found that normalized contextualized word embeddings generated using BERT correlated better with manually annotated importance scores than word2vec-based word embeddings. We make available a pairing of word embeddings and their human-annotated importance scores. We also provide proof-of-concept utility by training word importance models, achieving an F1-score of 0.57 in the 6-class word importance classification task.
- Conference Article
26
- 10.1145/3441852.3471209
- Oct 17, 2021
Deaf or hard-of-hearing (DHH) individuals heavily rely on their visual senses to be aware about their environment, giving them heightened visual cognition and improved attention management strategies. Thus, the eyes have shown to play a significant role in these visual communication practices and, therefore, many various researches have adopted methodologies, specifically eye-tracking, to understand the gaze patterns and analyze the behavior of DHH individuals. In this paper, we provide a literature review from 55 papers and data analysis from eye-tracking studies concerning hearing impairment, attention management strategies, and their mode of communication such as Visual and Textual based communication. Through this survey, we summarize the findings and provide future research directions.