Abstract
The unprecedented success of deep convolutional neural networks (CNN) on the task of video-based human action recognition assumes the availability of good resolution videos and resources to develop and deploy complex models. Unfortunately, certain budgetary and environmental constraints on the camera system and the recognition model may not be able to accommodate these assumptions and require reducing their complexity. To alleviate these issues, we introduce a deep sensing solution to directly recognize human actions from coded exposure images. Our deep sensing solution consists of a binary CNN-based encoder network that emulates the capturing of a coded exposure image of a dynamic scene using a coded exposure camera, followed by a 2D CNN for recognizing human action in the captured coded exposure image. Furthermore, we propose a novel knowledge distillation framework to jointly train the encoder and the action recognition model and show that the proposed training approach improves the action recognition accuracy by an absolute margin of 6.2%, 2.9%, and 7.9% on Something 2-v2, Kinetics-400, and UCF-101 datasets, respectively, in comparison to our previous approach. Finally, we built a prototype coded exposure camera using LCoS to validate the feasibility of our deep sensing solution. Our evaluation of the prototype camera show results that are consistent with the simulation results.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: IEEE transactions on pattern analysis and machine intelligence
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.