Depth Video Research Articles

Medical teleconsultation was among the initial use cases for early telepresence research projects since medical treatment often requires timely intervention by highly specialized experts. When remote medical experts support interventions, a holistic view of the surgical site can increase situation awareness and improve team communication. A possible solution is the concept of immersive telepresence, where remote users virtually join the operating theater that is transmitted based on a real-time reconstruction of the local site. Enabled by the availability of RGB-D sensors and sufficient computing capability, it becomes possible to capture such a site in real time using multiple stationary sensors. The 3D reconstruction and simplification of textured surface meshes from the point clouds of a dynamic scene in real time is challenging and becomes infeasible for increasing capture volumes. This work presents a tightly integrated, stateless 3D reconstruction pipeline for dynamic, room-scale environments that generates simplified surface meshes from multiple RGB-D sensors in real time. Our algorithm operates directly on the fused, voxelized point cloud instead of populating signed-distance volumes per frame and using a marching cube variant for surface reconstruction. We extend the formulation of the dual contouring algorithm to work for point cloud data stored in an octree and interleave a vertex-clustering-based simplification before extracting the surface geometry. Our 3D reconstruction pipeline can perform a live reconstruction of six incoming depth videos at their native frame rate of 30 frames per second, enabling the reconstruction of smooth movement. Arbitrarily complex scene changes are possible since we do not store persistent information between frames. In terms of mesh quality and hole filling, our method falls between the direct mesh reconstruction and expensive global fitting of implicit functions.

In this paper, we propose an intra-picture prediction method for depth video by a block clustering through a neural network. The proposed method solves a problem that the block that has two or more clusters drops the prediction performance of the intra prediction for depth video. The proposed neural network consists of both a spatial feature prediction network and a clustering network. The spatial feature prediction network utilizes spatial features in vertical and horizontal directions. The network contains a 1D CNN layer and a fully connected layer. The 1D CNN layer extracts the spatial features for a vertical direction and a horizontal direction from a top block and a left block of the reference pixels, respectively. 1D CNN is designed to handle time-series data, but it can also be applied to find the spatial features by regarding a pixel order in a certain direction as a timestamp. The fully connected layer predicts the spatial features of the block to be coded through the extracted features. The clustering network finds clusters from the spatial features which are the outputs of the spatial feature prediction network. The network consists of 4 CNN layers. The first 3 CNN layers combine two spatial features in the vertical and horizontal directions. The last layer outputs the probabilities that pixels belong to the clusters. The pixels of the block are predicted by the representative values of the clusters that are the average of the reference pixels belonging to the clusters. For the intra prediction for various block sizes, the block is scaled to the size of the network input. The prediction result through the proposed network is scaled back to the original size. In network training, the mean square error is used as a loss function between the original block and the predicted block. A penalty for output values far from both ends is introduced to the loss function for clear network clustering. In the simulation results, the bit rate is saved by up to 12.45% under the same distortion condition compared with the latest video coding standard.

Depth Video Research Articles

Related Topics

Articles published on Depth Video

Real-Time 3D Reconstruction Pipeline for Room-Scale, Immersive, Medical Teleconsultation

Global-Context Aggregated Intra Prediction Network for Depth Video Coding

Close contact behaviors of university and school students in 10 indoor environments

Transformer Models and Convolutional Networks with Different Activation Functions for Swallow Classification Using Depth Video Data

Attention-Based Fusion of Ultrashort Voice Utterances and Depth Videos for Multimodal Person Identification

Linked motion image‐based dynamic hand gesture recognition

Multisensor Collaboration Network for Video Compression Based on Wavelet Decomposition

Saving Bits Using Multi-Sensor Collaboration

Intra Prediction Method for Depth Video Coding by Block Clustering through Deep Learning

Spatial and temporal information fusion for human action recognition via Center Boundary Balancing Multimodal Classifier

TwinEDA: a sustainable deep-learning approach for limb-position estimation in preterm infants' depth images.

Novel 3D video action recognition deep learning approach for near real time epileptic seizure classification

Real-time human action recognition using raw depth video-based recurrent neural networks

Selective video enhancement in the Laguerre–Gauss domain

Discriminative Multi-View Dynamic Image Fusion for Cross-View 3-D Action Recognition.

An Efficient Approach for Patterns of Oriented Motion Flow Facial Expression Recognition from Depth Video

A Deep Sequence Learning Framework for Action Recognition in Small-Scale Depth Video Dataset.

Multiple Resolution Prediction With Deep Up-Sampling for Depth Video Coding

RDEN: Residual Distillation Enhanced Network-Guided Lightweight Synthesized View Quality Enhancement for 3D-HEVC

Deep Learning-Based Perceptual Video Quality Enhancement for 3D Synthesized View

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Depth Video Research Articles

Related Topics

Articles published on Depth Video

Real-Time 3D Reconstruction Pipeline for Room-Scale, Immersive, Medical Teleconsultation

Global-Context Aggregated Intra Prediction Network for Depth Video Coding

Close contact behaviors of university and school students in 10 indoor environments

Transformer Models and Convolutional Networks with Different Activation Functions for Swallow Classification Using Depth Video Data

Attention-Based Fusion of Ultrashort Voice Utterances and Depth Videos for Multimodal Person Identification

Linked motion image‐based dynamic hand gesture recognition

Multisensor Collaboration Network for Video Compression Based on Wavelet Decomposition

Saving Bits Using Multi-Sensor Collaboration

Intra Prediction Method for Depth Video Coding by Block Clustering through Deep Learning

Spatial and temporal information fusion for human action recognition via Center Boundary Balancing Multimodal Classifier

TwinEDA: a sustainable deep-learning approach for limb-position estimation in preterm infants' depth images.

Novel 3D video action recognition deep learning approach for near real time epileptic seizure classification

Real-time human action recognition using raw depth video-based recurrent neural networks

Selective video enhancement in the Laguerre–Gauss domain

Discriminative Multi-View Dynamic Image Fusion for Cross-View 3-D Action Recognition.

An Efficient Approach for Patterns of Oriented Motion Flow Facial Expression Recognition from Depth Video

A Deep Sequence Learning Framework for Action Recognition in Small-Scale Depth Video Dataset.

Multiple Resolution Prediction With Deep Up-Sampling for Depth Video Coding

RDEN: Residual Distillation Enhanced Network-Guided Lightweight Synthesized View Quality Enhancement for 3D-HEVC

Deep Learning-Based Perceptual Video Quality Enhancement for 3D Synthesized View