HiFormer: Hierarchical transformer for grounded situation recognition

Yulin Pan

doi:10.54254/2755-2721/49/20241075

Abstract

The prevalence of monitoring video is critical to public safety, but existing Object Detection and Action Recognition models are overwhelmed by camera operators, unable to identify relevant events. In light of this, Grounding Situation Recognition (GSR) provides a practical solution to recognize the events in a surveillance video. That is, GSR can identify the noun entities (e.g., humans) and their actions (e.g., driving), and provide grounding frames for involved entities. Compared with Action Recognition and Object Detection, GSR is more in line with human cognitive habits, better allowing law enforcement agencies to understand the predictions. However, the crucial issue with most existing frameworks is the neglect of verb ambiguity, that is, superficially similar verbs but have distinct meanings (e.g. buying v.s. giving). Many existing works propose a two-stage model, which first blindly predicts the verb, and then uses this verb information to predict semantic roles. These frameworks ignore the importance of noun information during verb prediction, making them susceptible to misidentifications. To address this problem and better discern between ambiguous verbs, we propose HiFormer, a novel hierarchical transformer framework with direct and comprehensive consideration of similar verbs for each image, to more accurately identify the salient verb, semantic roles, and the grounding frames. Compared with the state-of-the-art models in Grounded Situation Recognition (SituFormer and CoFormer), HiFormer shows an advantage of over 35% and 20% on the Top-1 and Top-5 verb accuracy respectively, as well as 13% on the Top-1 Noun accuracy.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

HiFormer: Hierarchical transformer for grounded situation recognition

Abstract

Talk to us

Similar Papers

More From: Applied and Computational Engineering

Lead the way for us

Journal: Applied and Computational Engineering	Publication Date: Mar 22, 2024
License type: cc-by

Similar Papers

Ontology evolution for personalised and adaptive activity recognition
Muhammad Safyan ... Raul Garcia Castro
IET Wireless Sensor Systems | VOL. 9
Muhammad Safyan, et. al.Muhammad Safyan ... Raul Garcia Castro
01 Aug 2019
IET Wireless Sensor Systems | VOL. 9

Development of a Wearable Camera and AI Algorithm for Medication Behavior Recognition.
Hwiwon Lee ... Sekyoung Youm
Sensors | VOL. 21
Hwiwon Lee, et. al.Hwiwon Lee ... Sekyoung Youm
21 May 2021
Sensors | VOL. 21

ADR-SPLDA: Activity discovery and recognition by combining sequential patterns and latent Dirichlet allocation
Belkacem Chikhaoui ... Hélène Pigot
Pervasive and Mobile Computing | VOL. 8
Belkacem Chikhaoui, et. al.Belkacem Chikhaoui ... Hélène Pigot
10 Aug 2012
Pervasive and Mobile Computing | VOL. 8

Enhancing Anomaly Detection in Surveillance Videos with Transfer Learning from Action Recognition
Kun Liu ... Tat-Seng Chua
-
Kun Liu, et. al.Kun Liu ... Tat-Seng Chua
12 Oct 2020
12 Oct 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

HiFormer: Hierarchical transformer for grounded situation recognition

Abstract

Talk to us

Similar Papers

More From: Applied and Computational Engineering