Abstract

Neuromorphic vision sensors (NVSs) save energy and reduce data at the source by asynchronously recording changes in temporal contrast. Thus, NVS provides an opportunity to exploit temporal and spatial redundancy in video streams by enabling the following deep neural network (DNN) processor for object recognition to process only foreground object regions in valid frames. However, the NVS data inevitably contains noise leading to false frame generation. Moreover, objects may be fragmented due to a lack of events leading to wrong object region proposals (RPs). Hence, it is important to have an always ON image processor to perform image restoration (IR) and RP operations for NVS data. In this article, we propose a hybrid memory bitcell with collocated static random access memory (SRAM) and dynamic random access memory (DRAM) consisting of 11 transistors [11T-collocated SRAM and DRAM (CRAM)] to perform in-memory computing (IMC)-based IR and RP for event-based binary image (EBBI) frame from a stationary NVS. We propose IMC-based charge diffusion for IR (denoise and region filling) by enabling a 2-D interconnection of bitcells across the whole array for globally parallel computing. The proposed CRAM supports projection mode for IMC-based RP, which enables 1-D projection of objects on the horizontal and vertical axes and finds regions through a recently proposed iterative algorithm. We also proposed an RP update (RPU) algorithm and hardware to improve RP accuracy by 1.6 <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\times $ </tex-math></inline-formula> over the prior art. Implemented in 65 nm CMOS, the 320 <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\times $ </tex-math></inline-formula> 240 quarter video graphics array (QVGA) macro achieves a maximal energy efficiency (throughput) of 1220 TOPS/W (1301 GOPS) without RPU and 915 TOPS/W (976 GOPS) with RPU, both of which are superior to the prior art. We also show that the accuracy of IR and RP obtained by the proposed architecture is better than earlier methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call