AviPer: assisting visually impaired people to perceive the world with visual-tactile multimodal attention network

Xinrong Li,Xinrong Li,Meiyu Huang,Yao Xu,Yamei Lu,Xueshuang Xiang,Yingze Cao,Pengfei Wang

doi:10.1007/s42486-022-00108-3

Abstract

Unlike able-bodied persons, it is difficult for visually impaired people, especially those in the educational age, to build a full perception of the world due to the lack of normal vision. The rapid development of AI and sensing technologies has provided new solutions to visually impaired assistance. However, to our knowledge, most previous studies focused on obstacle avoidance and environmental perception but paid less attention to educational assistance for visually impaired people. In this paper, we propose AviPer, a system that aims to assist visually impaired people to perceive the world via creating a continuous, immersive, and educational assisting pattern. Equipped with a self-developed flexible tactile glove and a webcam, AviPer can simultaneously predict the grasping object and provide voice feedback using the vision-tactile fusion classification model, when a visually impaired people is perceiving the object with his gloved hand. To achieve accurate multimodal classification, we creatively embed three attention mechanisms, namely temporal, channel-wise, and spatial attention in the model. Experimental results show that AviPer can achieve an accuracy of 99.75% in classification of 10 daily objects. We evaluated the system in a variety of extreme cases, which verified its robustness and demonstrated the necessity of visual and tactile modal fusion. We also conducted tests in the actual use scene and proved the usability and user-friendliness of the system. We opensourced the code and self-collected datasets in the hope of promoting research development and bringing changes to the lives of visually impaired people.

Full Text