Abstract

Visually impaired people face significant challenges in their work, studies, and daily lives. Nowadays, numerous visually impaired navigation devices are available to help visually impaired people solve daily life problems. These devices typically include modules for target recognition, distance measurement, and text reading aloud. They are designed to help visually impaired people avoid obstacles and understand the presence of things around them through text reading aloud. Due to the need to avoid all potential obstacles, the target recognition algorithms embedded in these devices must recognize a wide range of targets. However, the text reading aloud module cannot read all of them. Therefore, we designed a visual saliency assistance mechanism that simulates the regions that humans may pay the most attention to in the whole picture. The output of the visual saliency assistance mechanism is overlaid with the target recognition result, which can greatly reduce the target number of text reading aloud. This way, the visually impaired navigation device can not only help to avoid obstacles but also help visually impaired people understand the interest targets of most people in the whole picture. The visual saliency assistance mechanism we designed consists of three components: a spatio-temporal feature extraction (STFE) module, a spatio-temporal feature fusion (STFF) module, and a multi-scale feature fusion (MSFF) module. The STFF module fuses long-term spatio-temporal features and improves the temporal memory between frames. The MSFF module fully integrates information at different scales to improve the accuracy of saliency prediction. Therefore, this proposed visual saliency model can assist in the efficient operation of visually impaired navigation systems. The area under roc curve judd (AUC-J) metric of the proposed model was 93.9%, 93.8%, and 91.5% on three widely used saliency datasets: Holly-wood2, UCF Sports, and DHF1K, respectively. The results show that our proposed model outperforms the current state-of-the-art models.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call