Fast Motion Understanding with Spatiotemporal Neural Networks and Dynamic Vision Sensors

Anthony Bisulco,Daniel D Lee,Fernando Cladera,Volkan Isler

doi:10.1109/icra48506.2021.9561290

Abstract

This paper presents a Dynamic Vision Sensor (DVS) based system for reasoning about high-speed motion. As a representative scenario we consider a robot at rest, reacting to a small, fast approaching object at speeds higher than 15 m/s. Since conventional image sensors at typical frame rates observe such an object for only a few frames, estimating the underlying motion presents a considerable challenge for standard computer vision systems and algorithms. We present a method motivated by how animals such as insects solve this problem with their relatively simple vision systems.Our solution takes the event stream from a DVS and first encodes the temporal events with a set of causal exponential filters across multiple time scales. We couple these filters with a Convolutional Neural Network (CNN) to efficiently extract relevant spatiotemporal features. The combined network learns to output both the expected time to collision of the object, as well as the predicted collision point on a discretized polar grid. These critical estimates are computed with minimal delay by the network in order to react appropriately to the incoming object. We highlight our system’s results with a toy dart moving at 23.4 m/s with a 24.73° error in θ, 18.4 mm average discretized radius prediction error, and 25.03% median time to collision prediction error.

Full Text