Fusion of tactile and visual information in deep learning models for object recognition

Reza Pebdani Babadian,Karim Faez,Mahmood Amiri,Egidio Falotico

doi:10.1016/j.inffus.2022.11.032

Abstract

Humans use multimodal sensory information to understand the physical properties of their environment. Intelligent decision-making systems such as the ones used in robotic applications could also utilize the fusion of multimodal information to improve their performance and reliability. In recent years, machine learning and deep learning methods are used at the heart of such intelligent systems. Developing visuo-tactile models is a challenging task due to various problems such as performance, limited datasets, reliability, and computational efficiency. In this research, we propose four efficient models based on dynamic neural network architectures for unimodal and multimodal object recognition. For unimodal object recognition, TactileNet and VisionNet are proposed. For multimodal object recognition, the FusionNet-A and the FusionNet-B are designed to implement early and late fusion strategies, respectively. The proposed models have a flexible structure and are able to change at the train or test phase to accommodate the amount of available information. Model confidence calibration is employed to enhance the reliability and generalization of the models. The proposed models are evaluated on MIT CSAIL large-scale multimodal dataset. Our results demonstrate accurate performance in both unimodal and multimodal scenarios. It has been illustrated that by using different fusion strategies and augmenting the tactile-based models with visual information, the top-1 error rate of the single-frame tactile model was reduced by 78% and the mean average precision was increased by 2.19 times. Although the focus has been on the fusion of tactile and visual modalities, the proposed design methodology can be generalized to include more modalities.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Fusion of tactile and visual information in deep learning models for object recognition

Abstract

Talk to us

Similar Papers

More From: Information Fusion

Lead the way for us

Journal: Information Fusion	Publication Date: Dec 2, 2022
Citations: 13

Similar Papers

Research on Upper Limb Action Intention Recognition Method Based on Fusion of Posture Information and Visual Information
Jian-Wei Cui ... Bing-Yan Yan
Electronics | VOL. 11
Jian-Wei Cui, et. al.Jian-Wei Cui ... Bing-Yan Yan
27 Sep 2022
Electronics | VOL. 11

Research on Key Technology of Indoor Positioning Based on Fusion of Visual Information and Bluetooth Signal
Hongyang Lu ... Jindan Liu
-
Hongyang Lu, et. al.Hongyang Lu ... Jindan Liu
27 Aug 2021
27 Aug 2021

Evaluating multimodal relevance feedback techniques for medical image retrieval
Dimitrios Markonis ... Henning Müller
Information Retrieval Journal | VOL. 19
Dimitrios Markonis, et. al.Dimitrios Markonis ... Henning Müller
01 Aug 2015
Information Retrieval Journal | VOL. 19

Application of GNSS systems for automated control of working processes of graders with artificial intelligence
Oleksandr Yefymenko ... Tetiana Pluhina
Bulletin of Kharkov National Automobile and Highway University | VOL. 2
Oleksandr Yefymenko, et. al.Oleksandr Yefymenko ... Tetiana Pluhina
30 Jun 2023
Bulletin of Kharkov National Automobile and Highway University | VOL. 2

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Fusion of tactile and visual information in deep learning models for object recognition

Abstract

Talk to us

Similar Papers

More From: Information Fusion