Abstract

Mimicking the sampling mechanism of the primate fovea, a retina-inspired vision sensor named spiking camera has been developed, which has shown great potential for capturing high-speed dynamic scenes with a sampling rate of 40,000 Hz. Unlike conventional digital cameras, the spiking camera continuously captures photons and outputs asynchronous binary spikes with various inter-spike intervals to record dynamic scenes. However, how to reconstruct dynamic scenes from asynchronous spike streams remains challenging. In this work, we propose a novel pretext task to build a self-supervised reconstruction framework for spiking cameras. Specifically, we utilize the blind-spot network commonly used in self-supervised denoising tasks as our backbone, and perform self-supervised learning by constructing proper pseudo-labels. In addition, in view of the poor scalability and insufficient information utilization of the blind-spot network, we present a mutual learning framework to improve the overall performance of the network through mutual distillation between a non-blind-spot network and a blind-spot network. This also enables the network to bypass constraints of the blind-spot network, allowing state-of-the-art modules to be used to further improve performance. The experimental results demonstrate that our methods evidently outperform previous unsupervised spiking camera reconstruction methods and achieve desirable results compared with supervised methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call