Abstract
A fundamental challenge in machine learning today is to build a model that can learn from few examples. Here, we describe a reservoir based spiking neural model for learning to recognize actions with a limited number of labeled videos. First, we propose a novel encoding, inspired by how microsaccades influence visual perception, to extract spike information from raw video data while preserving the temporal correlation across different frames. Using this encoding, we show that the reservoir generalizes its rich dynamical activity toward signature action/movements enabling it to learn from few training examples. We evaluate our approach on the UCF-101 dataset. Our experiments demonstrate that our proposed reservoir achieves 81.3/87% Top-1/Top-5 accuracy, respectively, on the 101-class data while requiring just 8 video examples per class for training. Our results establish a new benchmark for action recognition from limited video examples for spiking neural models while yielding competitive accuracy with respect to state-of-the-art non-spiking neural models.
Highlights
The exponential increase in digital data with online media, surveillance cameras among others, creates a growing need to develop intelligent models for complex spatio-temporal processing
Since the main aim of our work is to develop a spiking model for action recognition in videos, we model the desired spiking activity at the output layer based on one-hot encoding
We demonstrated the effectiveness of a spikingbased reservoir computing architecture for learning from limited video examples on UCF101 action recognition task
Summary
The exponential increase in digital data with online media, surveillance cameras among others, creates a growing need to develop intelligent models for complex spatio-temporal processing. Studies on mammalian concept understanding have shown that humans and animals can rapidly learn complex visual concepts from single training examples (Lake et al, 2011). The state-of-the-art intelligent models require vast quantities of labeled data with extensive, iterative training to learn suitably and yield high performance. While the proliferation of digital media has led to the availability of massive raw and unstructured data, it is often impractical and expensive to gather annotated training datasets for all of them. This motivates our work on learning to recognize from few labeled data. We propose a reservoir based spiking neural model that reliably learns to recognize actions in video data by extracting long-range structure from a small number of training examples
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.