Abstract

Video object detection is a widely studied topic and has made significant progress in the past decades. However, the feature extraction and calculations in existing video object detectors demand decent imaging quality and avoidance of severe motion blur. Under extremely dark scenarios, due to limited sensor sensitivity, we have to trade off signal-to-noise ratio for motion blur compensation or vice versa, and thus suffer from performance deterioration. To address this issue, we propose to temporally multiplex a frame sequence into one snapshot and extract the cues characterizing object motion for trajectory retrieval. For effective encoding, we build a prototype for encoded capture by mounting a highly compatible programmable shutter. Correspondingly, in terms of decoding, we design an end-to-end deep network called detection from coded snapshot (DECENT) to retrieve sequential bounding boxes from the coded blurry measurements of dynamic scenes. For effective network learning, we generate quasi-real data by incorporating physically-driven noise into the temporally coded imaging model, which circumvents the unavailability of training data and with high generalization ability on real dark videos. The approach offers multiple advantages, including low bandwidth, low cost, compact setup, and high accuracy. The effectiveness of the proposed approach is experimentally validated under low illumination vision and provide a feasible way for night surveillance.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call