TAMformer: Multi-Modal Transformer with Learned Attention Mask for Early Intent Prediction

Nada Osman,Lamberto Ballan,Guglielmo Camporese

doi:10.1109/icassp49357.2023.10095781

Abstract

Human intention prediction is a growing area of research where an activity in a video has to be anticipated by a vision-based system. To this end, the model creates a representation of the past, and subsequently, it produces future hypotheses about upcoming scenarios. In this work, we focus on pedestrians’ early intention prediction in which, from a current observation of an urban scene, the model predicts the future activity of pedestrians that approach the street. Our method is based on a multi-modal transformer that encodes past observations and produces multiple predictions at different anticipation times. Moreover, we propose to learn the attention masks of our transformer-based model (Temporal Adaptive Mask Transformer) in order to weigh differently present and past temporal dependencies. We investigate our method on several public benchmarks for early intention prediction, improving the prediction performances at different anticipation times compared to the previous works.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

TAMformer: Multi-Modal Transformer with Learned Attention Mask for Early Intent Prediction

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

A Novel Human Intention Prediction Approach Based on Fuzzy Rules through Wearable Sensing in Human-Robot Handover.
Rui Zou ... Jie Zhao
Biomimetics (Basel, Switzerland) | VOL. 8
Rui Zou, et. al.Rui Zou ... Jie Zhao
10 Aug 2023
Biomimetics (Basel, Switzerland) | VOL. 8

Audio and Video-based Emotion Recognition using Multimodal Transformers
Vijay John ... Yasutomo Kawanishi
-
Vijay John, et. al.Vijay John ... Yasutomo Kawanishi
21 Aug 2022
21 Aug 2022

VL-InterpreT: An Interactive Visualization Tool for Interpreting Vision-Language Transformers
Estelle Aflalo ... Nan Duan
-
Estelle Aflalo, et. al.Estelle Aflalo ... Nan Duan
01 Jun 2022
01 Jun 2022

Multi-modal transformer with language modality distillation for early pedestrian action anticipation
Nada Osman ... Lamberto Ballan
Computer Vision and Image Understanding | VOL. 249
Nada Osman, et. al.Nada Osman ... Lamberto Ballan
10 Sep 2024
Computer Vision and Image Understanding | VOL. 249

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

TAMformer: Multi-Modal Transformer with Learned Attention Mask for Early Intent Prediction

Abstract

Talk to us

Similar Papers