A Framework for Scene-Flow Driven Creation of Time-Consistent Dynamic 3D Objects using Mesh Parametrizations
In this paper we propose a novel method to create dynamic mesh sequences with fixed connectivity from multiple camera video streams. Fixed connectivity is useful for dynamic mesh coding as mesh connectivity has to be transmitted only once and not frame-wise. The proposed method runs automatically. It deploys mesh parametrizations and Voronoi diagrams from the computer graphics realm as well as 2D optical and 3D scene flow from computer vision. Assuming that a given dynamic mesh sequence does not necessarily have constant connectivity, motion estimation is performed by applying a 3D motion field to a static reconstruction in one frame. Afterwards, mesh connectivity is transferred patch-wise from one frame to the next one using remeshing techniques. The entire method is independent of static mesh resolution, hence supplying a basis for dynamic mesh coding and simplification.
- Research Article
- 10.7939/r3h30j
- Jan 1, 2009
- University of Alberta Library
With the progress of computer graphics and computer vision technologies, 3D/multiview video applications such as 3D-TV and tele-immersive conference become more and more popular and are very likely to emerge as a prime application in the near future. A successful 3D/multiview video system needs synergistic integration of various technologies such as 3D/multiview video acquisition, compression, transmission and rendering. In this thesis, we focus on addressing the challenges for multiview video compression. In particular, we have made 5 major contributions: (1) We propose a novel neighbor-based multiview video compression system which helps remove the inter-view redundancies among multiple video streams and improve the performance. An optimal stream encoding order algorithm is designed to enable the encoder to automatically decide the stream encoding order and find the best reference streams. (2) A novel multiview video transcoder is designed and implemented. The proposed multiview video transcoder can be used to encode multiple compressed video streams and reduce the cost of multiview video acquisition system. (3) A learning-based multiview video compression scheme is invented. The novel multiview video compression algorithms are built on the recent advances on semi-supervised learning algorithms and achieve compression by finding a sparse representation of images. (4) Two novel distributed source coding algorithms, EETG and SNS-SWC, are put forward. Both EETG and SNS-SWC are capable to achieve the whole Slepian-Wolf rate region and are syndrome-based schemes. EETG simplifies the code construction algorithm for distributed source coding schemes using extended Tanner graph and is able to handle mismatched bits at the encoder. SNS-SWC has two independent decoders and thus can simplify the decoding process. (5) We propose a novel distributed multiview video coding scheme which allows flelxible rate allocation between two distributed multiview video encoders. SNS-SWC is used as the underlying Slepian-Wolf coding scheme. It is the first work to realize simultaneous Slepian-Wolf coding of stereo videos with the help of a distributed source code that achieves the whole Slepian-Wolf rate region. The proposed scheme has a better rate-distortion performance than the separate H.264 coding scheme in the high-rate case.
- Conference Article
2
- 10.1109/ccdc49329.2020.9164266
- Aug 1, 2020
In some scenarios, we need to combine multiple video streams and pictures into a single video. In this paper, we develop an application to synthesize two channel video stream and two channel pictures. Our application can be used for synthetic display of multiple network video streams and offline video synthetic storage. As the single thread decoding of multiple video streams takes a lot of time, we propose the use of a multithreading producer-consumer pattern. It makes full use of the advantages of multi-core CPU and improves the programming efficiency. In our experiments, we compare video synthesis through single thread and multithreading. Multithreading is much faster than single threading.
- Conference Article
48
- 10.1145/769953.769954
- May 22, 2003
We present blue-c, a new generation immersive projection and 3D video acquisition environment for virtual design and collaboration. It combines simultaneous acquisition of multiple live video streams with advanced 3D projection technology in a CAVETMlike environment, creating the impression of total immersion. The blue-c portal currently consists of three rectangular projection screens that are built from glass panels containing liquid crystal layers. These screens can be switched from a whitish opaque state (for projection) to a transparent state (for acquisition), which allows the video cameras to “look through” the walls. Our projection technology is based on active stereo using two LCD projectors per screen. The projectors are synchronously shuttered along with the screens, the stereo glasses, active illumination devices, and the acquisition hardware. From multiple video streams, we compute a 3D video representation of the user in real time. The resulting video inlays are integrated into a networked virtual environment. Our design is highly scalable, enabling blue-c to connect to portals with less sophisticated hardware.
- Research Article
164
- 10.1145/882262.882350
- Jul 1, 2003
- ACM Transactions on Graphics
We present blue-c , a new immersive projection and 3D video acquisition environment for virtual design and collaboration. It combines simultaneous acquisition of multiple live video streams with advanced 3D projection technology in a CAVE™-like environment, creating the impression of total immersion. The blue-c portal currently consists of three rectangular projection screens that are built from glass panels containing liquid crystal layers. These screens can be switched from a whitish opaque state (for projection) to a transparent state (for acquisition), which allows the video cameras to "look through" the walls. Our projection technology is based on active stereo using two LCD projectors per screen. The projectors are synchronously shuttered along with the screens, the stereo glasses, active illumination devices, and the acquisition hardware. From multiple video streams, we compute a 3D video representation of the user in real time. The resulting video inlays are integrated into a networked virtual environment. Our design is highly scalable, enabling blue-c to connect to portals with less sophisticated hardware.
- Conference Article
277
- 10.1145/1201775.882350
- Jul 1, 2003
We present blue-c, a new immersive projection and 3D video acquisition environment for virtual design and collaboration. It combines simultaneous acquisition of multiple live video streams with advanced 3D projection technology in a CAVE™-like environment, creating the impression of total immersion. The blue-c portal currently consists of three rectangular projection screens that are built from glass panels containing liquid crystal layers. These screens can be switched from a whitish opaque state (for projection) to a transparent state (for acquisition), which allows the video cameras to look through the walls. Our projection technology is based on active stereo using two LCD projectors per screen. The projectors are synchronously shuttered along with the screens, the stereo glasses, active illumination devices, and the acquisition hardware. From multiple video streams, we compute a 3D video representation of the user in real time. The resulting video inlays are integrated into a networked virtual environment. Our design is highly scalable, enabling blue-c to connect to portals with less sophisticated hardware.
- Conference Article
1
- 10.1109/cisis.2013.28
- Jul 1, 2013
An innovative Internet streaming video player, we call ePlayer, that supports personalised camera switching for live events has been researched, developed and evaluated. The main novelty of the system is that in dispensing the available bandwidth amongst multiple input video streams it can automatically coordinate and preserve the video quality across multiple live video streams. An additional novelty of the system is that it supports automatic camera switching based upon an individual user's preferences. The experimental results indicate that the system is able to effectively allocate a contested bandwidth resource amongst multiple streams and is able to infer a user's camera switching preferences via dynamically predicting a user's switching intervals amongst multiple cameras.
- Conference Article
- 10.1109/icufn.2012.6261740
- Jul 1, 2012
3D transmission is a new and challenging research issue in wireless network. Basically, 3D video stream is composed of multiple 2D video streams to create 3D viewing experience at the receiver. As compared to the 2D video stream, 3D video stream requires significant bandwidth requirements. However, in wireless networks, due to the limited frequency spectrum, it is difficult to transmit the full 3D video stream in one channel. One solution is to traffic bifurcation. By bifurcating the 3D video into multiple 2D video streams and transmitting each 2D video stream in separate routing path. Then the 3D video could be combined and reconstructed at the destination node. However, with this traffic bifurcation scheme, the relay nodes on different routing path might need to cooperate on the modulation scheme and transmission power so that the destination node could get the same video quality to successfully reconstruct the 3D video. In addition, with this traffic bifurcation scheme, the routing decision needs to consider the radio resource on each node because each node on the routing paths might transmit/receive multiple 2D video streams. Besides frequency spectrum and radio resource, another key performance measure is power consumption due to most of the nodes are battery driven in wireless networks. This problem is a challenging cross-layer design problem that includes the SNR-aware power control in the physical layer, cooperative routing in the network layer and video quality aware in the application layer to deliver the 3D video in power efficient way without violating the frequency spectrum and radio resource constraint. I propose a novel IVIP heuristic to tackle this problem. According to the computational experiments, the proposed IVIP heuristic outperforms other heuristics under all tested cases.
- Conference Article
63
- 10.1109/cvpr.2001.991044
- Dec 1, 2001
In this paper, novel algorithms computing dense 3D scene flow from multiview image sequences are described. A new hierarchical rule-based stereo matching algorithm is presented to estimate the initial disparity map. Different available constraints under a multiview camera setup are investigated and then utilized in the proposed motion estimation algorithms. We show two different formulations for 3D scene flow computation. One formulation assumes that initial disparity map is accurate while the other does not make this assumption. Image segmentation information is used to maintain the motion and depth discontinuities. Iterative implementations are used to successfully compute 3D scene flow and structure at every point in the reference image. Novel hard constraints are introduced in this paper to make the algorithms more accurate and robust. Promising experimental results are seen by applying our algorithms to real imagery.
- Research Article
13
- 10.1109/tcsvt.2016.2610078
- Apr 1, 2017
- IEEE Transactions on Circuits and Systems for Video Technology
As of today, fine-grained editing of facial performances in movie and video production requires either retouching every single frame or creating a highly detailed CGI model of the actor, both of which is restricted to high-budget productions. In this paper, we present an example-based approach for facial performance editing that achieves realistic results with standard equipment and very little manual intervention. Based on a model-free surface tracking approach, temporally consistent dynamic texture sequences are extracted from multiple video streams. Using such geometry-plus-texture sequences allows transferring facial expressions/performances between videos, which enables the editor, for example, to change the facial expression while leaving the rest of the video untouched. Moreover, concatenating and/or looping these sequences in a motion-graph-like manner offers a convenient way of composing novel facial performances from multiple source videos. Finally, we present a blending method to seamlessly concatenate texture/mesh sequences and to insert a composed facial performance in a target video.
- Research Article
10
- 10.1111/cgf.13934
- May 1, 2020
- Computer Graphics Forum
Subdivision surfaces have become an invaluable asset in production environments. While progress over the last years has allowed the use of graphics hardware to meet performance demands during animation and rendering, high‐performance is limited to immutable mesh connectivity scenarios. Motivated by recent progress in mesh data structures, we show how the complete Catmull‐Clark subdivision scheme can be abstracted in the language of linear algebra. While this high‐level formulation allows for a fully parallel implementation with significant performance gains, the underlying algebraic operations require further specialization for modern parallel hardware. Integrating domain knowledge about the mesh matrix data structure, we replace costly general linear algebra operations like matrix‐matrix multiplication by specialized kernels. By further considering innate properties of Catmull‐Clark subdivision, like the quad‐only structure after refinement, we achieve an additional order of magnitude in performance and significantly reduce memory footprints. Our approach can be adapted seamlessly for different use cases, such as regular subdivision of dynamic meshes, fast evaluation for immutable topology and feature‐adaptive subdivision for efficient rendering of animated models. In this way, patchwork solutions are avoided in favor of a streamlined solution with consistent performance gains throughout the production pipeline. The versatility of the sparse matrix linear algebra abstraction underlying our work is further demonstrated by extension to other schemes such as and Loop subdivision.
- Research Article
6
- 10.1016/j.cma.2021.114222
- Jan 13, 2022
- Computer Methods in Applied Mechanics and Engineering
Krylov subspace recycling for evolving structures
- Conference Article
89
- 10.1109/cvpr42600.2020.00141
- Jun 1, 2020
We describe an approach for upgrading 2D optical flow to 3D scene flow. Our key insight is that dense optical expansion – which can be reliably inferred from monocular frame pairs – reveals changes in depth of scene elements, e.g., things moving closer will get bigger. When integrated with camera intrinsics, optical expansion can be converted into a normalized 3D scene flow vectors that provide meaningful directions of 3D movement, but not their magnitude (due to an underlying scale ambiguity). Normalized scene flow can be further “upgraded” to the true 3D scene flow knowing depth in one frame. We show that dense optical expansion between two views can be learned from annotated optical flow maps or unlabeled video sequences, and applied to a variety of dynamic 3D perception tasks including optical scene flow, LiDAR scene flow, time-to-collision estimation and depth estimation, often demonstrating significant improvement over the prior art.
- Research Article
227
- 10.1016/j.isprsjprs.2017.09.013
- Nov 22, 2017
- ISPRS Journal of Photogrammetry and Remote Sensing
Object Scene Flow
- Research Article
50
- 10.1016/j.cviu.2007.04.002
- May 13, 2007
- Computer Vision and Image Understanding
Multi-scale 3D scene flow from binocular stereo sequences
- Book Chapter
28
- 10.1007/978-3-540-24671-8_29
- Jan 1, 2004
Many interesting problems in computer vision can be formulated as a minimization problem for an energy functional. If this functional is given as an integral of a scalar-valued weight function over an unknown hypersurface, then the minimal surface we are looking for can be determined as a solution of the functional’s Euler-Lagrange equation. This paper deals with a general class of weight functions that may depend on the surface point and normal. By making use of a mathematical tool called the method of the moving frame, we are able to derive the Euler-Lagrange equation in arbitrary-dimensional space and without the need for any surface parameterization. Our work generalizes existing proofs, and we demonstrate that it yields the correct evolution equations for a variety of previous computer vision techniques which can be expressed in terms of our theoretical framework. In addition, problems involving minimal hypersurfaces in dimensions higher than three, which were previously impossible to solve in practice, can now be introduced and handled by generalized versions of existing algorithms. As one example, we sketch a novel idea how to reconstruct temporally coherent geometry from multiple video streams.