Abstract

The framework provides a novel pipeline for action recognition. The action recognition task classifies the action label of the scene. High-speed cameras are commonly used to generate high frame-rate videos for capturing sufficient motion information. However, the data volume would be the bottleneck of the system. With the insight that the discrete cosine transform (DCT) of video signals reveals the motion information remarkably, instead of obtaining video data as with traditional cameras, the proposed method directly captures a DCT spectrum of video in a single shot through optical pixel-wise encoding. Considering that video signals are sparsely distributed in the DCT domain, a learning-based frequency selector is designed for pruning the trivial frequency channels of the spectrum. An opto-electronic neural network is designed for action recognition from a single coded spectrum. The optical encoder generates the DCT spectrum, and the rest of the network jointly optimizes the frequency selector and classification model simultaneously. Compared to conventional video-based action recognition methods, the proposed method achieves higher accuracy with less data, less communication bandwidth, and less computational burden. Both simulations and experiments demonstrate that the proposed method has superior action recognition performance. To the best of our knowledge, this is the first work investigating action recognition in the DCT domain.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call