Abstract

This paper presents a novel spatio-temporal domain adaptation (DA) method that is unsupervised and event-driven for dynamic vision sensor (DVS) gesture recognition. This method realizes the transfer between the source domain and the target domain data in both the spatial and temporal dimension without the need of target domain data labels. Specifically, it consists of a deep spiking neural network (SNN)-based feature extractor, a label predictor and a domain discriminator. A time-space gradient reversal layer is responsible for building a spatio-temporal bridge between the domain discriminator and the feature extractor, which is essential to the alignment of the source domain spike features with the target ones and to achieve domain adaptation on both space and time dimensions. To demonstrate the effectiveness of our method, we adapted DVS hand gesture data from one temporal resolution to another and from original data to denoised data. Our method can provide up to 10.39% improvement in accuracy. Its accuracy improvement is more stable comparing with RNN-based and LSTM-based methods in this DA framework especially when the two domains have partial similarity in DVS data.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call