New advances in visual computing for intelligent processing of visual media and augmented reality

Songhai Zhang,Juhong Wang,Ralph R Martin

doi:10.1007/s11431-015-5991-0

Abstract

Visual imagery constitutes the most important sensory information for humans. The entire field of acquiring, analyzing and synthesizing visual data by means of computers is called visual computing. It has an extraordinarily wide range of applications, including for example: industrial quality control, street view and driver assistance systems, robot navigation, multimedia systems, and computer games. Visual computing comprises four key areas: computer vision and image processing, computer graphics, virtual and augmented reality, and visualization. It requires deep, interdisciplinary scientific knowledge, in particular in computer science, mathematics, physics, engineering, and cognitive sciences. It tackles high level tasks such as editing and composition of visual content [1] and recognition of sematic content [2–5], as well as basic low level problems such as denoising [6,7] and decomposing [8,9] images, video and 3D shapes. Denosing is often a first step to provide high quality inputs to more complicated tasks such as panoramic image stitching to construct a street view database, and image understanding in computer vision. The object of image denoising is to reduce the noise level, while preserving edges and textures as much as possible. The necessary processing may be done in the image domain, or frequency domain using FFT or wavelets. Recent studies have considered how to represent contour information in images using ridgelets, curvelets and contourlets, as well as finding suitable threshold schemes to remove noise [6]. There is also corresponding work on mesh denosing, which finds structures in terms of positional and normal features [7]. Image decomposition concerns the splitting of an image into two or more components. A fundamental goal in image analysis and computer vision is to extract meaningful components from an image, for tasks such as scene understanding, generation of visual media and many intelligent applications of visual content. Various image decomposition strategies exist. One approach is cartoon and texture decomposition: ref. [9] gave a good survey on the existing decomposition models and extended the nonlinear filter method to decompose an image into three components: the cartoon component, i.e. the main geometric structures, the oscillatory component, or texture, and noise. An image can also be decomposed into a lighting image and a reflectance image known as the intrinsic image. Automatic intrinsic image decomposition remains a significant challenge, particularly for real-world scenes. Recent advances to this longstanding problem are data-driven methods based on large scale datasets of ground truth data. For example, ref. [8] built a dataset of more than 5000 labeled intrinsic images as a basis for a decomposition algorithm, as demonstrated in Figure 1. Computer understanding of scenes or objects within them is a fundamental problem in computer vision, and is crucial to intelligent applications based on visual content. For example, efficient recognition of objects and scene understanding of a live video captured by cameras on a car allows a driver assistance system to instantly feed appropriate information to the driver, or to the cars controls directly.

Full Text