Abstract
Recognizing human activities is one of the crucial capabilities that a robot needs to have to be useful around people. Although modern robots are equipped with various types of sensors, human activity recognition (HAR) still remains a challenging problem, particularly in the presence of noisy sensor data. In this work, we introduce a multimodal graphical attention-based HAR approach, called Multi-GAT, which hierarchically learns complementary multimodal features. We develop a multimodal mixture-of-experts model to disentangle and extract salient modality-specific features that enable feature interactions. Additionally, we introduce a novel message-passing based graphical attention approach to capture cross-modal relation for extracting complementary multimodal features. The experimental results on two multimodal human activity datasets suggest that Multi-GAT outperformed state-of-the-art HAR algorithms across all datasets and metrics tested. Finally, the experimental results with noisy sensor data indicate that Multi-GAT consistently outperforms all the evaluated baselines. The robust performance suggests that Multi-GAT can enable seamless human-robot collaboration in noisy human environments.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have