Reference View Research Articles

As humans, we have the capacity to refer to the things in the world around us. In everyday spoken communication, we often use words to describe intended referents (such as objects, people, and events), and our bodies (e.g., eyes, head, and hands) to indicate the location to which our addressee should focus her attention in order to further identify what we are talking about (Buhler, 1934; Clark and Bangerter, 2004). Traditionally, referring has been described as an autonomous and addressee-blind act that speakers do on their own without taking into account beliefs about their addressees' knowledge about a referent (e.g., Olson, 1970; see Clark and Bangerter, 2004). In contrast, more recent views consider it rather a collaborative enterprise that requires that speaker and addressee work together, for instance in reaching mutual agreement on how to conceptualize and name a particular entity (e.g., Clark and Wilkes-Gibbs, 1986; Brennan and Clark, 1996; Clark and Bangerter, 2004). Such agreement is established through interaction, and the addressee is at least as important as the speaker in reaching agreement and establishing reference. In prototypical instances of successful referring, speakers often produce spatial demonstratives like this and that to establish joint attention between speaker and addressee to a visible entity (Buhler, 1934; Levinson, 1983). Such demonstratives are among the most frequently used words in language, among the first words infants produce (Clark and Sengul, 1978), and possibly primordial in phylogeny (Diessel, 2006; Tomasello, 2008). Surprisingly, despite the advances made toward a social, collaborative account of referring more generally, the prevailing theoretical view on spatial demonstratives has remained deeply individual and egocentric, as illustrated by the following claims: “[T]he anchoring point of deictic expressions is egocentric (or, better, speaker-centric). Adult speakers skillfully relate what they are talking about to this me-here-now” (Levelt, 1989, p. 46). Spatial demonstratives “indicate the relative distance of an object, location, or person vis-a-vis the deictic center (…), which is usually associated with the location of the speaker” (Diessel, 1999, p. 36). “[D]emonstratives are interpreted based on the speaker's body” ((Diessel, 2014), p. 122). This egocentric account is intuitively appealing and still influential (e.g., Diessel, 2014; Stevens and Zhang, 2014). In the current paper, we question this account from both the production and the comprehension side, and discuss recent accumulating observational, experimental, and neuroscientific evidence that suggests an alternative social and multimodal view of demonstrative reference.

Read full abstract

In free viewpoint video systems, a user has the freedom to select a virtual view from which an image of the 3D scene is rendered, and the scene is commonly represented by color and depth images of multiple nearby viewpoints. In such representation, there exists data redundancy across multiple dimensions: 1) a 3D voxel may be represented by pixels in multiple viewpoint images (inter-view redundancy); 2) a pixel patch may recur in a distant spatial region of the same image due to self-similarity (inter-patch redundancy); and 3) pixels in a local spatial region tend to be similar (inter-pixel redundancy). It is important to exploit these redundancies during inter-view prediction toward effective multiview video compression. In this paper, we propose an encoder-driven inpainting strategy for inter-view predictive coding, where explicit instructions are transmitted minimally, and the decoder is left to independently recover remaining missing data via inpainting, resulting in lower coding overhead. In particular, after pixels in a reference view are projected to a target view via depth-image-based rendering at the decoder, the remaining holes in the target view are filled via an inpainting process in a block-by-block manner. First, blocks are ordered in terms of difficulty-to-inpaint by the decoder. Then, explicit instructions are only sent for the reconstruction of the most difficult blocks. In particular, the missing pixels are explicitly coded via a graph Fourier transform or a sparsification procedure using discrete cosine transform, leading to low coding cost. For blocks that are easy to inpaint, the decoder independently completes missing pixels via template-based inpainting. We apply our proposed scheme to frames in a prediction structure defined by JCT-3V where inter-view prediction is dominant, and experimentally we show that our scheme achieves up to 3-dB gain in peak-signal-to-noise-ratio in reconstructed image quality over a comparable 3D-High Efficiency Video Coding implementation using fixed 16 $\times $ 16 block size.

Read full abstract

Reference View Research Articles

Related Topics

Articles published on Reference View

Estimation of Virtual View Synthesis Distortion Toward Virtual View Position.

This and That Revisited: A Social and Multimodal Approach to Spatial Demonstratives.

Optimal reference view selection algorithm for low complexity disparity estimation

Reference View Selection in DIBR-Based Multiview Coding.

Rate Distortion Optimized Inter-View Frame Level Bit Allocation Method for MV-HEVC

Faithful Disocclusion Filling in Depth Image Based Rendering Using Superpixel-Based Inpainting

Encoder-Driven Inpainting Strategy in Multiview Video Compression.

Principles of Transformative Learning Workshop (USAFA January 2015)

Sequential block-based disparity map estimation algorithm for stereoscopic image coding

Influence of management behavior on the skilled labor migrations’ unsafe behavior

Object Files, Properties, and Perceptual Content

View Synthesis Distortion Estimation With a Graphical Model and Recursive Calculation of Probability Distribution

On Teaching Future Time To EFL Learners: Problems And Solutions

Insights gained from three-dimensional imaging modalities for closure of ventricular septal defects.

Improved interview video error concealment on whole frame packet loss

Present and future climatologies in the phase I CREMA experiment

Projection-based disparity control for toed-in multiview images.

Efficient multi-view video coding using inter-view information

Block-Based In-Loop View Synthesis for 3-D Video Coding

Seamless View Synthesis Through Texture Optimization

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Reference View Research Articles

Related Topics

Articles published on Reference View

Estimation of Virtual View Synthesis Distortion Toward Virtual View Position.

This and That Revisited: A Social and Multimodal Approach to Spatial Demonstratives.

Optimal reference view selection algorithm for low complexity disparity estimation

Reference View Selection in DIBR-Based Multiview Coding.

Rate Distortion Optimized Inter-View Frame Level Bit Allocation Method for MV-HEVC

Faithful Disocclusion Filling in Depth Image Based Rendering Using Superpixel-Based Inpainting

Encoder-Driven Inpainting Strategy in Multiview Video Compression.

Principles of Transformative Learning Workshop (USAFA January 2015)

Sequential block-based disparity map estimation algorithm for stereoscopic image coding

Influence of management behavior on the skilled labor migrations’ unsafe behavior

Object Files, Properties, and Perceptual Content

View Synthesis Distortion Estimation With a Graphical Model and Recursive Calculation of Probability Distribution

On Teaching Future Time To EFL Learners: Problems And Solutions

Insights gained from three-dimensional imaging modalities for closure of ventricular septal defects.

Improved interview video error concealment on whole frame packet loss

Present and future climatologies in the phase I CREMA experiment

Projection-based disparity control for toed-in multiview images.

Efficient multi-view video coding using inter-view information

Block-Based In-Loop View Synthesis for 3-D Video Coding

Seamless View Synthesis Through Texture Optimization