Abstract
We present a novel framework aimed at improving video action detection through the integration of heterogeneous features. Conventional action detection methods which focus on modeling the relationships between person/object instances rely exclusively on video features and do not exploit valuable intra-instance heterogeneous features, such as person pose, positional information or object category, that can support action recognition. Our proposed framework, termed Heterogeneous Feature Fusion (HFF) framework, addresses this limitation by integrating such intra-instance heterogeneous features for person/object instances, and can improve existing action detection methods. To efficiently exploit each heterogeneous feature, which vary in importance depending on actions and/or scenes, we introduce an attention mechanism to dynamically enhance important heterogeneous features within an instance. Experiments on JHMDB and AVA v2.2 datasets show that our HFF significantly enhances the action detection performance of two existing methods.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.