A neuromorphic system for video object recognition.

Deepak Khosla,Yang Chen,Kyungnam Kim

doi:10.3389/fncom.2014.00147

Abstract

Automated video object recognition is a topic of emerging importance in both defense and civilian applications. This work describes an accurate and low-power neuromorphic architecture and system for real-time automated video object recognition. Our system, Neuormorphic Visual Understanding of Scenes (NEOVUS), is inspired by computational neuroscience models of feed-forward object detection and classification pipelines for processing visual data. The NEOVUS architecture is inspired by the ventral (what) and dorsal (where) streams of the mammalian visual pathway and integrates retinal processing, object detection based on form and motion modeling, and object classification based on convolutional neural networks. The object recognition performance and energy use of the NEOVUS was evaluated by the Defense Advanced Research Projects Agency (DARPA) under the Neovision2 program using three urban area video datasets collected from a mix of stationary and moving platforms. These datasets are challenging and include a large number of objects of different types in cluttered scenes, with varying illumination and occlusion conditions. In a systematic evaluation of five different teams by DARPA on these datasets, the NEOVUS demonstrated the best performance with high object recognition accuracy and the lowest energy consumption. Its energy use was three orders of magnitude lower than two independent state of the art baseline computer vision systems. The dynamic power requirement for the complete system mapped to commercial off-the-shelf (COTS) hardware that includes a 5.6 Megapixel color camera processed by object detection and classification algorithms at 30 frames per second was measured at 21.7 Watts (W), for an effective energy consumption of 5.45 nanoJoules (nJ) per bit of incoming video. These unprecedented results show that the NEOVUS has the potential to revolutionize automated video object recognition toward enabling practical low-power and mobile video processing applications.

Highlights

Unmanned platforms are becoming one of the major sources of data for intelligence and surveillance both on and off the battlefield
AND DISCUSSION we describe results of Neuormorphic Visual Understanding of Scenes (NEOVUS) evaluation by Defense Advanced Research Projects Agency (DARPA) on three urban area video datasets (Tower, Helicopter and TAILWIND, Figure 8) during summative testing conducted at the end of the Neovision2 program
The videos are processed through NEOVUS and its outputs in the form of object locations, bounding boxes, and class labels is logged for every frame

Summary

INTRODUCTION

Unmanned platforms are becoming one of the major sources of data for intelligence and surveillance both on and off the battlefield. Two problems arise from these emerging trends: (1) High bandwidth is required to send data from the platform to ground stations even with good compression and (2) High workload is imposed on analysts and end-users to process the data One solution to these problems is to perform on-board automated image and video analysis (e.g., detect, recognize and track objects of interest) to enable better and timely situational awareness, reduce the amount of data to be streamed, and reduce the end-user workload. ARCHITECTURE The NEOVUS is a neuromorphic object-recognition architecture and system that is inspired by the ventral (what) and dorsal (where) streams of the mammalian visual pathway (Mishkin et al, 1983) It is based on and consistent with neuroscience theories and models of mammalian pathways implicated in visual processing (Mishkin et al, 1983; Huang and Grossberg, 2010). Perceptual boundaries and surfaces (lightness and color) are hypothesized to form; complementary edge and surface processes define boundaries across noise occlusions, fill-in featural properties and aid in figure-ground separation (Mishkin et al, 1983; Elder and Zucker, 1998; Huang and Grossberg, 2010)

Classification and learning

RESULTS AND DISCUSSION

CONCLUSIONS