Measuring and Removing Realistic Image Noise

Tobias Plötz

doi:10.26083/tuprints-00019105

Abstract

When capturing photographs with a digital camera, the resulting images are inherently affected by noise. Image denoising, i. e. the task of recovering the underlying clean image from a noisy observation, is fundamental to improve the perceptual quality, to help further visual reasoning, or to guide the optimization for more general image restoration tasks. Since image noise is a stochastic phenomenon arising from different sources, such as the randomness introduced through the photon arrival process or the electric circuits on the camera chip, recovering the exact noiseless image is in general not possible. The challenge of the image denoising problem now arises by imposing suitable assumptions on both the formation process of the noisy image as wells as on the properties of clean images that we want to recover. These assumptions are either encoded explicitly within a mathematical framework that gives the denoised image as the solution of an optimization problem, or implicitly by choosing a discriminative model, e. g. a convolutional neural network (CNN), that is learned on training data comprised of pairs of clean and noisy images. Having defined a denoising algorithm, it is natural to ask for assessing the quality of the output. Here, the research community by and large relies on synthetic test data for quantitative evaluation where supposedly noiseless images are corrupted by simulated noise. However, evaluating on simulated data can only be a proxy to assessing the accuracy on realistic images. The first contribution of this dissertation fills this gap by proposing a novel methodology for creating realistic test data for image denoising. Specifically, we propose to capture pairs of real noisy and almost noiseless reference images. We show how to extract accurate ground truth from the reference image by taking the underlying image formation process into account. Since the image denoising problem is inherently ill-posed it is interesting to go beyond predicting a single possible outcome by additionally assessing the uncertainty of the prediction. Probabilistic approaches to image denoising naturally lend themselves for uncertainty prediction since they model the posterior distribution of denoised images given the noisy observation. However, inferring the quantities of interest, e. g. the marginal entropy at each pixel, is oftentimes not feasible. Our second contribution proposes a novel stochastic variational inference (SVI) algorithm that fits a variational approximation (Wainwright and Jordan, 2008) to estimate model-based uncertainty on the pixel level. We demonstrate that the resulting algorithm SVIGL is on par or even outperforms the strong baseline of SVI with the popular Adam optimizer (Kingma and Ba, 2015) in terms of speed, robustness, and accuracy. In this thesis we are also concerned with advancing the state of the art in terms of raw denoising accuracy. Currently, neural network based approaches yield the most powerful denoisers. Looking at more traditional methods, non-local approaches (Dabov et al., 2006) tend to be competitive. To combine the best of both worlds, in our third contribution we endow a strong CNN denoiser with a novel block matching layer, called neural nearest neighbors (N3) block, for which we propose a fully differentiable relaxation of the k-nearest neighbor (KNN) selection rule. This allows the network to optimize the feature space on which block matching is conducted. Our N3 block is applicable for general input domains as exemplified on the set reasoning task of correspondence classification. While the aforementioned parts of this dissertation deal with the common case of a saturating camera sensor, i. e. intensity values increase up to a maximal value, we also consider a novel sensor concept called modulo sensor (Zhao et al., 2015) that is promising for high dynamic range (HDR) imaging. Here, pixel elements reset once they reach their maximal value. To obtain a plausible image we need to infer how often each pixel was reset during the exposure. In our fourth contribution we particularly want to reconstruct this information from multiple noisy modulo images. We propose to faithfully model the image formation process and use this generative model in an energy minimization framework to obtain a reconstructed and denoised HDR image, outperforming prior approaches to reconstruction from multiple modulo images.

Full Text