Spatio-temporal human action localization in indoor surveillances

Zihao Liu,Danfeng Yan,Yuanqiang Cai,Yan Song

doi:10.1016/j.patcog.2023.110087

Abstract

Spatio-temporal action localization is a crucial and challenging task in the field of video understanding. Existing benchmarks for spatio-temporal action detection are limited by factors such as incomplete annotations, high-level non-universal actions, and uncommon scenarios. To address these limitations and facilitate research in real-world security applications, we introduce a novel human-centric dataset for spatio-temporal localization of atomic actions in indoor surveillance settings, termed as HIA (Human-centric Indoor Actions). The HIA dataset is constructed by selecting 30 atomic action classes, compiling 100 surveillance videos, and annotating 219,225 frames with 370,937 bounding boxes. The primary characteristics of HIA include (1) accurate spatio-temporal annotations for atomic actions, (2) human-centric annotations at the frame level, (3) temporal linking of persons across discontinuous tracks, and (4) utilization of indoor surveillance videos. Our HIA, with its realistic settings in indoor surveillance scenes and comprehensive annotations, presents a valuable and novel challenge to the spatio-temporal action localization domain. To establish a benchmark, we evaluate various methods and provide an in-depth analysis of the HIA dataset. The HIA dataset will be made available soon, and we anticipate that it will serve as a standard and practical benchmark for the research community.

Full Text