Abstract

We present a novel framework aimed at improving video action detection through the integration of heterogeneous features. Conventional action detection methods which focus on modeling the relationships between person/object instances rely exclusively on video features and do not exploit valuable intra-instance heterogeneous features, such as person pose, positional information or object category, that can support action recognition. Our proposed framework, termed Heterogeneous Feature Fusion (HFF) framework, addresses this limitation by integrating such intra-instance heterogeneous features for person/object instances, and can improve existing action detection methods. To efficiently exploit each heterogeneous feature, which vary in importance depending on actions and/or scenes, we introduce an attention mechanism to dynamically enhance important heterogeneous features within an instance. Experiments on JHMDB and AVA v2.2 datasets show that our HFF significantly enhances the action detection performance of two existing methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call