Purpose As humans convey information about emotions by speech signals, emotion recognition via auditory information is often employed to assess one’s affective states. There are numerous ways of applying the knowledge of emotional vocal expressions to system designs that accommodate users’ needs adequately. Yet, little is known about how people with visual disabilities infer emotions from speech stimuli, especially via online platforms (e.g., Zoom). This study focussed on examining the degree to which they perceive emotions strongly or weakly, i.e., perceived intensity but also investigating the degree to which their sociodemographic backgrounds affect them perceiving different intensity levels of emotions when exposed to a set of emotional speech stimuli via Zoom. Materials and methods A convenience sample of 30 individuals with visual disabilities participated in zoom interviews. Participants were given a set of emotional speech stimuli and reported the intensity level of the perceived emotions on a rating scale from 1 (weak) to 8 (strong). Results When the participants were exposed to the emotional speech stimuli, calm, happy, fearful, sad, and neutral, they reported that neutral was the dominant emotion they perceived with the greatest intensity. Individual differences were also observed in the perceived intensity of emotions, associated with sociodemographic backgrounds, such as health, vision, job, and age. Conclusions The results of this study are anticipated to contribute to the fundamental knowledge that will be helpful for many stakeholders such as voice technology engineers, user experience designers, health professionals, and social workers providing support to people with visual disabilities. IMPLICATIONS FOR REHABILITATION Technologies equipped with alternative user interfaces (e.g., Siri, Alexa, and Google Voice Assistant) meeting the needs of people with visual disabilities can promote independent living and quality of life. Such technologies can also be equipped with systems that can recognize emotions via users’ voice, such that users can obtain services customized to fit their emotional needs or adequately address their emotional challenges (e.g., early detection of onset, provision of advice, and so on). The results of this study can be beneficial to health professionals (e.g., social workers) who work closely with clients who have visual disabilities (e.g., virtual telehealth sessions) as they could gain insights or learn how to recognize and understand the clients’ emotional struggle by hearing their voice, which is contributing to enhancement of emotional intelligence. Thus, they can provide better services to their clients, leading to building a strong bond and trust between health professionals and clients with visual disabilities even they meet virtually (e.g., Zoom).