Multimodal deep learning model for human handover classification

Islam A Monir,Nashwa El-Bendary,Mohamed W Fakhr

doi:10.11591/eei.v11i2.3690

Abstract

Giving and receiving objects between humans and robots is a critical task which collaborative robots must be able to do. In order for robots to achieve that, they must be able to classify different types of human handover motions. Previous works did not mainly focus on classifying the motion type from both giver and receiver perspectives. However, they solely focused on object grasping, handover detection, and handover classification from one side only (giver/receiver). This paper discusses the design and implementation of different deep learning architectures with long short term memory (LSTM) network; and different feature selection techniques for human handover classification from both giver and receiver perspectives. Classification performance while using unimodal and multimodal deep learning models is investigated. The data used for evaluation is a publicly available dataset with four different modalities: motion tracking sensors readings, Kinect readings for 15 joints positions, 6-axis inertial sensor readings, and video recordings. The multimodality added a huge boost in the classification performance; achieving 96% accuracy with the feature selection based deep learning architecture.

Full Text