Abstract
Notable progress was achieved by recent technologies. As the main goal of technology is to make daily life easier, we will investigate the development of an intelligent system for the assistance of impaired people in their navigation. For visually impaired people, navigating is a very complex task that requires assistance. To reduce the complexity of this task, it is preferred to provide information that allows the understanding of surrounding spaces. Particularly, recognizing indoor scenes such as a room, supermarket, or office provides a valuable guide to the visually impaired person to understand the surrounding environment. In this paper, we proposed an indoor scene recognition system based on recent deep learning techniques. Vision transformer (ViT) is a recent deep learning technique that has achieved high performance on image classification. So, it was deployed for indoor scene recognition. To achieve better performance and to reduce the computation complexity, we proposed dual multiscale attention to collect features at different scales for better representation. The main idea was to process small patches and large patches separately and a fusion technique was proposed to combine the features. The proposed fusion technique requires linear time for memory and computation compared to existing techniques that require quadratic time. To prove the efficiency of the proposed technique, extensive experiments were performed on a public dataset which is the MIT67 dataset. The achieved results demonstrated the superiority of the proposed technique compared to the state-of-the-art. Further, the proposed indoor scene recognition system is suitable for implementation on mobile devices with fewer parameters and FLOPs.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.