RGB Video Research Articles

This paper aims to address the challenge of reconstructing long volumetric videos from multi-view RGB videos. Recent dynamic view synthesis methods leverage powerful 4D representations, like feature grids or point cloud sequences, to achieve high-quality rendering results. However, they are typically limited to short (1~2s) video clips and often suffer from large memory footprints when dealing with longer videos. To solve this issue, we propose a novel 4D representation, named Temporal Gaussian Hierarchy, to compactly model long volumetric videos. Our key observation is that there are generally various degrees of temporal redundancy in dynamic scenes, which consist of areas changing at different speeds. Motivated by this, our approach builds a multi-level hierarchy of 4D Gaussian primitives, where each level separately describes scene regions with different degrees of content change, and adaptively shares Gaussian primitives to represent unchanged scene content over different temporal segments, thus effectively reducing the number of Gaussian primitives. In addition, the tree-like structure of the Gaussian hierarchy allows us to efficiently represent the scene at a particular moment with a subset of Gaussian primitives, leading to nearly constant GPU memory usage during the training or rendering regardless of the video length. Moreover, we design a Compact Appearance Model that mixes diffuse and view-dependent Gaussians to further minimize the model size while maintaining the rendering quality. We also develop a rasterization pipeline of Gaussian primitives based on the hardware-accelerated technique to improve rendering speed. Extensive experimental results demonstrate the superiority of our method over alternative methods in terms of training cost, rendering speed, and storage usage. To our knowledge, this work is the first approach capable of efficiently handling hours of volumetric video data while maintaining state-of-the-art rendering quality.

Read full abstract

Biomechanical data collection was largely confined to controlled laboratory setups, relying on marker-based systems or force platforms. However, the emergence of wearable sensors and markerless motion capture has revolutionized this field, enabling data collection in real-world scenarios. This shift has also sparked interest in integrating machine learning (ML) into biomechanical workflows, promising to revolutionize data acquisition in field settings and refine data analysis (Halilaj et al., 2018). Figure 1 provides an overview of corresponding ML applications for key tasks in the biomechanical workflow. This commentary explores the transformative potential of ML in biomechanics, focusing on enhancing data collection and analysis in real-world environments. Pose estimation (a) is the process of automatically tracking and determining the body’s anatomical landmarks, body segments, or joint centre locations in video images using ML, enabling the quantification of human movement without marker and sensor attachments to the human body. Feature estimation (b) employs ML to predict complex biomechanical data, e.g. joint moments from more accessible data sources, including IMUs, pressure insoles, and RGB video cameras. Event detection (c) in time series data is the annotation of certain events that are used to extract useful and vital information or to remove unwanted and unnecessary data for further analysis. Clustering (d) involves grouping instances or individuals with similar biomechanical characteristics using unsupervised ML, thereby revealing underlying structures and subgroups within complex biomechanical data. Finally, automated classification (e) refers to the process of developing a predictive model that assigns input features of data samples to predefined categories or classes using supervised ML. Despite the advancements of ML in biomechanics, central challenges are faced. Estimation errors remain critically high depending on the task and application field, necessitating a careful reflection on data acquisition potentials. Furthermore, especially complex Deep Learning models, while showing promising performances, exhibit a lack of transparency in understanding their decision-making processes and the underlying patterns and rules learned from the data. This phenomenon, often termed as the black-box nature of these models, poses a considerable obstacle. In response, Explainable Artificial Intelligence (XAI), a field that encompasses different explainability approaches to shed light on the inner workings of complex, non-linear ML models, has gained increasing attention in recent years (Slijepcevic et al., 2023). A further central challenge is the availability of data and annotations, which describes the limited availability or insufficiency of relevant and comprehensive datasets, as well as task-specific annotations, for conducting thorough analyses and research in biomechanics. The lack of large-scale benchmark datasets available restricts the widespread adoption of ML-based approaches in biomechanics. Privacy concerns present significant ethical and legal issues, e.g. with identifiable video data. Additionally, model validation remains a critical problem, as many studies fail to validate their models on diverse datasets, often relying on limited data from a single laboratory. Furthermore, there is a risk that the fundamental mechanical understanding of biomechanical processes might be overshadowed by an over-reliance on ML techniques. In conclusion, the integration of ML into biomechanics presents a transformative opportunity for understanding human movement by enabling and improving data collection and analysis in real-world settings. Challenges in data accessibility and methodological transparency necessitate collaborative efforts for future advancements.

Read full abstract

RGB Video Research Articles

Articles published on RGB Video

Unmanned aerial vehicles for human detection and recognition using neural-network model

Quark: Real-time, High-resolution, and General Neural View Synthesis

Representing Long Volumetric Video with Temporal Gaussian Hierarchy

BiCap: A novel bi-modal dataset of daily living dual-arm manipulation actions

Spatiotemporal Sensitive Network for Non-Contact Heart Rate Prediction from Facial Videos

Depth Video-Based Secondary Action Recognition in Vehicles via Convolutional Neural Network and Bidirectional Long Short-Term Memory with Spatial Enhanced Attention Mechanism

PrivacyLens: On-Device PII Removal from RGB Images using Thermally-Enhanced Sensing

From lab to field with machine learning – Bridging the gap for movement analysis in real-world environments: A commentary

SSTtrack: A unified hyperspectral video tracking framework via modeling spectral-spatial-temporal conditions

SmartDetector: Automatic and vision-based approach to point-light display generation for human action perception.

Fully automated unsupervised learning approach for thermal camera calibration and an accurate COVID-19 human temperature tracking

ST-Phys: Unsupervised Spatio-Temporal Contrastive Remote Physiological Measurement.

Contrast-Phys+: Unsupervised and Weakly-Supervised Video-Based Remote Physiological Measurement via Spatiotemporal Contrast.

Deep Learning-based Depth Estimation Methods from Monocular Image and Videos: A Comprehensive Survey

Incorporating texture and silhouette for video-based person re-identification

Towards Estimation of 3D Poses and Shapes of Animals from Oblique Drone Imagery

Temporal cues enhanced multimodal learning for action recognition in RGB-D videos

Depth over RGB: automatic evaluation of open surgery skills using depth camera

Facial expressions to identify post-stroke: A pilot study

HQ3DAvatar: High-quality Implicit 3D Head Avatar

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

RGB Video Research Articles

Articles published on RGB Video

Unmanned aerial vehicles for human detection and recognition using neural-network model

Quark: Real-time, High-resolution, and General Neural View Synthesis

Representing Long Volumetric Video with Temporal Gaussian Hierarchy

BiCap: A novel bi-modal dataset of daily living dual-arm manipulation actions

Spatiotemporal Sensitive Network for Non-Contact Heart Rate Prediction from Facial Videos

Depth Video-Based Secondary Action Recognition in Vehicles via Convolutional Neural Network and Bidirectional Long Short-Term Memory with Spatial Enhanced Attention Mechanism

PrivacyLens: On-Device PII Removal from RGB Images using Thermally-Enhanced Sensing

From lab to field with machine learning – Bridging the gap for movement analysis in real-world environments: A commentary

SSTtrack: A unified hyperspectral video tracking framework via modeling spectral-spatial-temporal conditions

SmartDetector: Automatic and vision-based approach to point-light display generation for human action perception.

Fully automated unsupervised learning approach for thermal camera calibration and an accurate COVID-19 human temperature tracking

ST-Phys: Unsupervised Spatio-Temporal Contrastive Remote Physiological Measurement.

Contrast-Phys+: Unsupervised and Weakly-Supervised Video-Based Remote Physiological Measurement via Spatiotemporal Contrast.

Deep Learning-based Depth Estimation Methods from Monocular Image and Videos: A Comprehensive Survey

Incorporating texture and silhouette for video-based person re-identification

Towards Estimation of 3D Poses and Shapes of Animals from Oblique Drone Imagery

Temporal cues enhanced multimodal learning for action recognition in RGB-D videos

Depth over RGB: automatic evaluation of open surgery skills using depth camera

Facial expressions to identify post-stroke: A pilot study

HQ3DAvatar: High-quality Implicit 3D Head Avatar