Abstract

We propose a single-shot approach for actor-action detection in videos. The existing approaches use a two-step process, which rely on Region Proposal Network (RPN), where the action is estimated based on the detected proposals followed by post-processing such as non-maximal suppression. While effective in terms of performance, these methods pose limitations in scalability for dense video scenes with a high memory requirement for thousand of proposals, which leads to slow processing time. We propose SSA2D, a unified end-to-end deep network, which performs joint actor-action detection in a single-shot without the need of any proposals and post-processing, making it memory as well as time efficient.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call