A Framework for Scene-Flow Driven Creation of Time-Consistent Dynamic 3D Objects using Mesh Parametrizations

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

In this paper we propose a novel method to create dynamic mesh sequences with fixed connectivity from multiple camera video streams. Fixed connectivity is useful for dynamic mesh coding as mesh connectivity has to be transmitted only once and not frame-wise. The proposed method runs automatically. It deploys mesh parametrizations and Voronoi diagrams from the computer graphics realm as well as 2D optical and 3D scene flow from computer vision. Assuming that a given dynamic mesh sequence does not necessarily have constant connectivity, motion estimation is performed by applying a 3D motion field to a static reconstruction in one frame. Afterwards, mesh connectivity is transferred patch-wise from one frame to the next one using remeshing techniques. The entire method is independent of static mesh resolution, hence supplying a basis for dynamic mesh coding and simplification.

Similar Papers
  • Research Article
  • 10.7939/r3h30j
Multiview video compression
  • Jan 1, 2009
  • University of Alberta Library
  • Baochun Bai

With the progress of computer graphics and computer vision technologies, 3D/multiview video applications such as 3D-TV and tele-immersive conference become more and more popular and are very likely to emerge as a prime application in the near future. A successful 3D/multiview video system needs synergistic integration of various technologies such as 3D/multiview video acquisition, compression, transmission and rendering. In this thesis, we focus on addressing the challenges for multiview video compression. In particular, we have made 5 major contributions: (1) We propose a novel neighbor-based multiview video compression system which helps remove the inter-view redundancies among multiple video streams and improve the performance. An optimal stream encoding order algorithm is designed to enable the encoder to automatically decide the stream encoding order and find the best reference streams. (2) A novel multiview video transcoder is designed and implemented. The proposed multiview video transcoder can be used to encode multiple compressed video streams and reduce the cost of multiview video acquisition system. (3) A learning-based multiview video compression scheme is invented. The novel multiview video compression algorithms are built on the recent advances on semi-supervised learning algorithms and achieve compression by finding a sparse representation of images. (4) Two novel distributed source coding algorithms, EETG and SNS-SWC, are put forward. Both EETG and SNS-SWC are capable to achieve the whole Slepian-Wolf rate region and are syndrome-based schemes. EETG simplifies the code construction algorithm for distributed source coding schemes using extended Tanner graph and is able to handle mismatched bits at the encoder. SNS-SWC has two independent decoders and thus can simplify the decoding process. (5) We propose a novel distributed multiview video coding scheme which allows flelxible rate allocation between two distributed multiview video encoders. SNS-SWC is used as the underlying Slepian-Wolf coding scheme. It is the first work to realize simultaneous Slepian-Wolf coding of stereo videos with the help of a distributed source code that achieves the whole Slepian-Wolf rate region. The proposed scheme has a better rate-distortion performance than the separate H.264 coding scheme in the high-rate case.

  • Conference Article
  • Cite Count Icon 2
  • 10.1109/ccdc49329.2020.9164266
Synthesis of multiple video streams through multi-thread programming
  • Aug 1, 2020
  • Jian Wang + 4 more

In some scenarios, we need to combine multiple video streams and pictures into a single video. In this paper, we develop an application to synthesize two channel video stream and two channel pictures. Our application can be used for synthetic display of multiple network video streams and offline video synthetic storage. As the single thread decoding of multiple video streams takes a lot of time, we propose the use of a multithreading producer-consumer pattern. It makes full use of the advantages of multi-core CPU and improves the programming efficiency. In our experiments, we compare video synthesis through single thread and multithreading. Multithreading is much faster than single threading.

  • Conference Article
  • Cite Count Icon 48
  • 10.1145/769953.769954
Blue-c
  • May 22, 2003
  • M Gross

We present blue-c, a new generation immersive projection and 3D video acquisition environment for virtual design and collaboration. It combines simultaneous acquisition of multiple live video streams with advanced 3D projection technology in a CAVETMlike environment, creating the impression of total immersion. The blue-c portal currently consists of three rectangular projection screens that are built from glass panels containing liquid crystal layers. These screens can be switched from a whitish opaque state (for projection) to a transparent state (for acquisition), which allows the video cameras to “look through” the walls. Our projection technology is based on active stereo using two LCD projectors per screen. The projectors are synchronously shuttered along with the screens, the stereo glasses, active illumination devices, and the acquisition hardware. From multiple video streams, we compute a 3D video representation of the user in real time. The resulting video inlays are integrated into a networked virtual environment. Our design is highly scalable, enabling blue-c to connect to portals with less sophisticated hardware.

  • Research Article
  • Cite Count Icon 164
  • 10.1145/882262.882350
Blue-c
  • Jul 1, 2003
  • ACM Transactions on Graphics
  • Markus Gross + 12 more

We present blue-c , a new immersive projection and 3D video acquisition environment for virtual design and collaboration. It combines simultaneous acquisition of multiple live video streams with advanced 3D projection technology in a CAVE™-like environment, creating the impression of total immersion. The blue-c portal currently consists of three rectangular projection screens that are built from glass panels containing liquid crystal layers. These screens can be switched from a whitish opaque state (for projection) to a transparent state (for acquisition), which allows the video cameras to "look through" the walls. Our projection technology is based on active stereo using two LCD projectors per screen. The projectors are synchronously shuttered along with the screens, the stereo glasses, active illumination devices, and the acquisition hardware. From multiple video streams, we compute a 3D video representation of the user in real time. The resulting video inlays are integrated into a networked virtual environment. Our design is highly scalable, enabling blue-c to connect to portals with less sophisticated hardware.

  • Conference Article
  • Cite Count Icon 277
  • 10.1145/1201775.882350
Blue-c
  • Jul 1, 2003
  • Markus Gross + 12 more

We present blue-c, a new immersive projection and 3D video acquisition environment for virtual design and collaboration. It combines simultaneous acquisition of multiple live video streams with advanced 3D projection technology in a CAVE™-like environment, creating the impression of total immersion. The blue-c portal currently consists of three rectangular projection screens that are built from glass panels containing liquid crystal layers. These screens can be switched from a whitish opaque state (for projection) to a transparent state (for acquisition), which allows the video cameras to look through the walls. Our projection technology is based on active stereo using two LCD projectors per screen. The projectors are synchronously shuttered along with the screens, the stereo glasses, active illumination devices, and the acquisition hardware. From multiple video streams, we compute a 3D video representation of the user in real time. The resulting video inlays are integrated into a networked virtual environment. Our design is highly scalable, enabling blue-c to connect to portals with less sophisticated hardware.

  • Conference Article
  • Cite Count Icon 1
  • 10.1109/cisis.2013.28
Personalising Multi Video Streams and Camera Views for Live Events
  • Jul 1, 2013
  • Zhenchen Wang

An innovative Internet streaming video player, we call ePlayer, that supports personalised camera switching for live events has been researched, developed and evaluated. The main novelty of the system is that in dispensing the available bandwidth amongst multiple input video streams it can automatically coordinate and preserve the video quality across multiple live video streams. An additional novelty of the system is that it supports automatic camera switching based upon an individual user's preferences. The experimental results indicate that the system is able to effectively allocate a contested bandwidth resource amongst multiple streams and is able to infer a user's camera switching preferences via dynamically predicting a user's switching intervals amongst multiple cameras.

  • Conference Article
  • 10.1109/icufn.2012.6261740
Power efficient routing algorithm for 3D video transmission in bandwidth-limited and radio-limited wireless networks
  • Jul 1, 2012
  • Hong-Hsu Yen + 1 more

3D transmission is a new and challenging research issue in wireless network. Basically, 3D video stream is composed of multiple 2D video streams to create 3D viewing experience at the receiver. As compared to the 2D video stream, 3D video stream requires significant bandwidth requirements. However, in wireless networks, due to the limited frequency spectrum, it is difficult to transmit the full 3D video stream in one channel. One solution is to traffic bifurcation. By bifurcating the 3D video into multiple 2D video streams and transmitting each 2D video stream in separate routing path. Then the 3D video could be combined and reconstructed at the destination node. However, with this traffic bifurcation scheme, the relay nodes on different routing path might need to cooperate on the modulation scheme and transmission power so that the destination node could get the same video quality to successfully reconstruct the 3D video. In addition, with this traffic bifurcation scheme, the routing decision needs to consider the radio resource on each node because each node on the routing paths might transmit/receive multiple 2D video streams. Besides frequency spectrum and radio resource, another key performance measure is power consumption due to most of the nodes are battery driven in wireless networks. This problem is a challenging cross-layer design problem that includes the SNR-aware power control in the physical layer, cooperative routing in the network layer and video quality aware in the application layer to deliver the 3D video in power efficient way without violating the frequency spectrum and radio resource constraint. I propose a novel IVIP heuristic to tackle this problem. According to the computational experiments, the proposed IVIP heuristic outperforms other heuristics under all tested cases.

  • Conference Article
  • Cite Count Icon 63
  • 10.1109/cvpr.2001.991044
On 3D scene flow and structure estimation
  • Dec 1, 2001
  • Ye Zhang + 1 more

In this paper, novel algorithms computing dense 3D scene flow from multiview image sequences are described. A new hierarchical rule-based stereo matching algorithm is presented to estimate the initial disparity map. Different available constraints under a multiview camera setup are investigated and then utilized in the proposed motion estimation algorithms. We show two different formulations for 3D scene flow computation. One formulation assumes that initial disparity map is accurate while the other does not make this assumption. Image segmentation information is used to maintain the motion and depth discontinuities. Iterative implementations are used to successfully compute 3D scene flow and structure at every point in the reference image. Novel hard constraints are introduced in this paper to make the algorithms more accurate and robust. Promising experimental results are seen by applying our algorithms to real imagery.

  • Research Article
  • Cite Count Icon 13
  • 10.1109/tcsvt.2016.2610078
A Hybrid Approach for Facial Performance Analysis and Editing
  • Apr 1, 2017
  • IEEE Transactions on Circuits and Systems for Video Technology
  • Wolfgang Paier + 3 more

As of today, fine-grained editing of facial performances in movie and video production requires either retouching every single frame or creating a highly detailed CGI model of the actor, both of which is restricted to high-budget productions. In this paper, we present an example-based approach for facial performance editing that achieves realistic results with standard equipment and very little manual intervention. Based on a model-free surface tracking approach, temporally consistent dynamic texture sequences are extracted from multiple video streams. Using such geometry-plus-texture sequences allows transferring facial expressions/performances between videos, which enables the editor, for example, to change the facial expression while leaving the rest of the video untouched. Moreover, concatenating and/or looping these sequences in a motion-graph-like manner offers a convenient way of composing novel facial performances from multiple source videos. Finally, we present a blending method to seamlessly concatenate texture/mesh sequences and to insert a composed facial performance in a target video.

  • Research Article
  • Cite Count Icon 10
  • 10.1111/cgf.13934
Subdivision‐Specialized Linear Algebra Kernels for Static and Dynamic Mesh Connectivity on the GPU
  • May 1, 2020
  • Computer Graphics Forum
  • D Mlakar + 5 more

Subdivision surfaces have become an invaluable asset in production environments. While progress over the last years has allowed the use of graphics hardware to meet performance demands during animation and rendering, high‐performance is limited to immutable mesh connectivity scenarios. Motivated by recent progress in mesh data structures, we show how the complete Catmull‐Clark subdivision scheme can be abstracted in the language of linear algebra. While this high‐level formulation allows for a fully parallel implementation with significant performance gains, the underlying algebraic operations require further specialization for modern parallel hardware. Integrating domain knowledge about the mesh matrix data structure, we replace costly general linear algebra operations like matrix‐matrix multiplication by specialized kernels. By further considering innate properties of Catmull‐Clark subdivision, like the quad‐only structure after refinement, we achieve an additional order of magnitude in performance and significantly reduce memory footprints. Our approach can be adapted seamlessly for different use cases, such as regular subdivision of dynamic meshes, fast evaluation for immutable topology and feature‐adaptive subdivision for efficient rendering of animated models. In this way, patchwork solutions are avoided in favor of a streamlined solution with consistent performance gains throughout the production pipeline. The versatility of the sparse matrix linear algebra abstraction underlying our work is further demonstrated by extension to other schemes such as and Loop subdivision.

  • Research Article
  • Cite Count Icon 6
  • 10.1016/j.cma.2021.114222
Krylov subspace recycling for evolving structures
  • Jan 13, 2022
  • Computer Methods in Applied Mechanics and Engineering
  • M Bolten + 3 more

Krylov subspace recycling for evolving structures

  • Conference Article
  • Cite Count Icon 89
  • 10.1109/cvpr42600.2020.00141
Upgrading Optical Flow to 3D Scene Flow Through Optical Expansion
  • Jun 1, 2020
  • Gengshan Yang + 1 more

We describe an approach for upgrading 2D optical flow to 3D scene flow. Our key insight is that dense optical expansion – which can be reliably inferred from monocular frame pairs – reveals changes in depth of scene elements, e.g., things moving closer will get bigger. When integrated with camera intrinsics, optical expansion can be converted into a normalized 3D scene flow vectors that provide meaningful directions of 3D movement, but not their magnitude (due to an underlying scale ambiguity). Normalized scene flow can be further “upgraded” to the true 3D scene flow knowing depth in one frame. We show that dense optical expansion between two views can be learned from annotated optical flow maps or unlabeled video sequences, and applied to a variety of dynamic 3D perception tasks including optical scene flow, LiDAR scene flow, time-to-collision estimation and depth estimation, often demonstrating significant improvement over the prior art.

  • Research Article
  • Cite Count Icon 227
  • 10.1016/j.isprsjprs.2017.09.013
Object Scene Flow
  • Nov 22, 2017
  • ISPRS Journal of Photogrammetry and Remote Sensing
  • Moritz Menze + 2 more

Object Scene Flow

  • Research Article
  • Cite Count Icon 50
  • 10.1016/j.cviu.2007.04.002
Multi-scale 3D scene flow from binocular stereo sequences
  • May 13, 2007
  • Computer Vision and Image Understanding
  • Rui Li + 1 more

Multi-scale 3D scene flow from binocular stereo sequences

  • Book Chapter
  • Cite Count Icon 28
  • 10.1007/978-3-540-24671-8_29
Weighted Minimal Hypersurfaces and Their Applications in Computer Vision
  • Jan 1, 2004
  • Bastian Goldlücke + 1 more

Many interesting problems in computer vision can be formulated as a minimization problem for an energy functional. If this functional is given as an integral of a scalar-valued weight function over an unknown hypersurface, then the minimal surface we are looking for can be determined as a solution of the functional’s Euler-Lagrange equation. This paper deals with a general class of weight functions that may depend on the surface point and normal. By making use of a mathematical tool called the method of the moving frame, we are able to derive the Euler-Lagrange equation in arbitrary-dimensional space and without the need for any surface parameterization. Our work generalizes existing proofs, and we demonstrate that it yields the correct evolution equations for a variety of previous computer vision techniques which can be expressed in terms of our theoretical framework. In addition, problems involving minimal hypersurfaces in dimensions higher than three, which were previously impossible to solve in practice, can now be introduced and handled by generalized versions of existing algorithms. As one example, we sketch a novel idea how to reconstruct temporally coherent geometry from multiple video streams.

Save Icon
Up Arrow
Open/Close
Notes

Save Important notes in documents

Highlight text to save as a note, or write notes directly

You can also access these Documents in Paperpal, our AI writing tool

Powered by our AI Writing Assistant