Overview and Empirical Analysis of ISP Parameter Tuning for Visual Perception in Autonomous Driving.

Lucie Yahiaoui,Ciarán Hughes,Senthil Yogamani,Patrick Denny,Jonathan Horgan,Brian Deegan

doi:10.3390/jimaging5100078

Abstract

Image quality is a well understood concept for human viewing applications, particularly in the multimedia space, but increasingly in an automotive context as well. The rise in prominence of autonomous driving and computer vision brings to the fore research in the area of the impact of image quality in camera perception for tasks such as recognition, localization and reconstruction. While the definition of “image quality” for computer vision may be ill-defined, what is clear is that the configuration of the image signal processing pipeline is the key factor in controlling the image quality for computer vision. This paper is partly review and partly positional with demonstration of several preliminary results promising for future research. As such, we give an overview of what is an Image Signal Processor (ISP) pipeline, describe some typical automotive computer vision problems, and give a brief introduction to the impact of image signal processing parameters on the performance of computer vision, via some empirical results. This paper provides a discussion on the merits of automatically tuning the ISP parameters using computer vision performance indicators as a cost metric, and thus bypassing the need to explicitly define what “image quality” means for computer vision. Due to lack of datasets for performing ISP tuning experiments, we apply proxy algorithms like sharpening before the vision algorithm processing. We performed these experiments with a classical algorithm namely AKAZE and a machine learning algorithm for pedestrian detection. We obtain encouraging results, such as an improvement of 14% accuracy for pedestrian detection by tuning sharpening technique parameters. We hope that this encourages creation of such datasets for more systematic evaluation of these topics.

Highlights

The fundamental concepts of image and video quality are well understood in consumer electronics, in the multimedia context [1], and are the subject of standardization [2,3]
Pixels in the image are tracked or matched from one frame to the using either sparse or dense optical flow or feature extraction and matching techniques. This is main step which happens on the image domain and it is commonly accomplished by feature matching algorithms like Scale-invariant feature transform (SIFT), AKAZE, etc., [47] which will be one of the main algorithms we evaluate for impact on Image Signal Processor (ISP)
By observing images after edge detection, it can be observed that Sobel images of the histogram equalized images are very similar to the original whereas noise is detected as edges in Contrast Limited Adaptive Histogram Equalization (CLAHE)

Summary

Introduction

The fundamental concepts of image and video quality are well understood in consumer electronics, in the multimedia context [1], and are the subject of standardization [2,3]. What “good quality” means is not so straightforward, with no single clear definition available [4,5]. This is compounded by the fact that video is necessary for two distinct applications: display to the driver (e.g., rear view and multi-camera surround view monitoring) and computer vision for advanced driver assistance systems. This is pertinent in the move towards autonomous driving platforms, where camera systems are a diverse and key. We provide some background on ISP Architectures and Computer Vision, with the aim to give the reader sufficient background to appreciate the remainder of the paper

Objectives

Methods

Results

Discussion

Conclusion