RGB-D Sequences Research Articles

The multimodal based human action recognition is an attracting increasing topic since the different modalities can provide complementary information. However, it is difficult to improve the recognition performance due to the limitation of the ability to learn spatial-temporal features. In this paper, we propose a novel approach for multimodal human action recognition by learning complementary features from RGB-D sequence. Firstly, a segmented rank pooling method is proposed to compress the entire RGB-D sequence into dynamic images as inputs to the Convolutional Networks (ConvNets) for capturing spatial-temporal information. Secondly, a Segment Cooperative ConvNets (SC-ConvNets) is designed to learn the complementary features from RGB-D modalities. Different from the ConvNets-based approaches that learn multimodal features with multiple separate networks, the proposed SC-ConvNets enhance the recognition performance through joint optimization learning of single ConvNets. Then a single loss function is optimized to narrow the variance between RGB and depth modalities. In order to verify the effectiveness of the proposed method, we evaluate the SC-ConvNets on four public benchmark multimodal datasets, including NTU RGB+D 60, NTU RGB+D 120, SYSU 3D HOI, and PKU-MMD datasets. The proposed method achieves an accuracy of 89.4% and 91.2% for cross-subject and cross-view on NTU RGB+D 60, 86.9% and 87.7% for cross-subject and cross-setup on NTU RGB+D 120, 92.1% and 93.2% for cross-subject and cross-view on PKU-MMD, which are the state-of-the-art, and the accuracy of 84.2% and 82.9% for setting-1 and setting-2 on SYSU 3D HOI, which are comparable to the state-of-the-art. The experimental results demonstrate that the proposed segmented rank pooling can represent discriminative spatial-temporal information from the entire RGB and depth sequence, and the proposed SC-ConvNets can enhance recognition performance by learning complementary features from different modalities.

Semantically rich indoor models are increasingly used throughout a facility’s life cycle for different applications. With the decreasing price of 3D sensors, it is convenient to acquire point cloud data from consumer-level scanners. However, most existing methods in 3D indoor reconstruction from point clouds involve a tedious manual or interactive process due to line-of-sight occlusions and complex space structures. Using the multiple types of data obtained by RGB-D devices, this paper proposes a fast and automatic method for reconstructing semantically rich indoor 3D building models from low-quality RGB-D sequences. Our method is capable of identifying and modelling the main structural components of indoor environments such as space, wall, floor, ceilings, windows, and doors from the RGB-D datasets. The method includes space division and extraction, opening extraction, and global optimization. For space division and extraction, rather than distinguishing room spaces based on the detected wall planes, we interactively define the start-stop position for each functional space (e.g., room, corridor, kitchen) during scanning. Then, an interior elements filtering algorithm is proposed for wall component extraction and a boundary generation algorithm is used for space layout determination. For opening extraction, we propose a new noise robustness method based on the properties of convex hull, octrees structure, Euclidean clusters and the camera trajectory for opening generation, which is inapplicable to the data collected in the indoor environments due to inevitable occlusion. A global optimization approach for planes is designed to eliminate the inconsistency of planes sharing the same global plane, and maintain plausible connectivity between the walls and the relationships between the walls and openings. The final model is stored according to the CityGML3.0 standard. Our approach allows for the robust generation of semantically rich 3D indoor models and has strong applicability and reconstruction power for complex real-world datasets.

RGB-D Sequences Research Articles

Related Topics

Articles published on RGB-D Sequences

Estimating 6D Object Poses with Temporal Motion Reasoning for Robot Grasping in Cluttered Scenes

The Florence multi-resolution 3D facial expression dataset

SID-SLAM: Semi-Direct Information-Driven RGB-D SLAM

Dual-function depth camera array for inline 3D reconstruction of complex pipelines

First-Person Hand Action Recognition Using Multimodal Data

Robust and Accurate 3D Self-Portraits in Seconds

Volumetric Instance-Level Semantic Mapping Via Multi-View 2D-to-3D Label Diffusion

Spatial-Temporal Information Aggregation and Cross-Modality Interactive Learning for RGB-D-Based Human Action Recognition

ROSEFusion

Cross-Modality Compensation Convolutional Neural Networks for RGB-D Action Recognition

Segment spatial-temporal representation and cooperative learning of convolution neural networks for multimodal-based action recognition

Joint learning of convolution neural networks for RGB‐D‐based human action recognition

Motion segmentation of RGB-D sequences: Combining semantic and motion information using statistical inference.

Compositional Learning of Human Activities With a Self-Organizing Neural Architecture.

Learning and Tracking the 3D Body Shape of Freely Moving Infants from RGB-D sequences

NON-RIGID MULTI-BODY TRACKING IN RGBD STREAMS

Geometry-Aware ICP for Scene Reconstruction from RGB-D Camera

RGB-D action recognition based on discriminative common structure learning model

Collaborative multimodal feature learning for RGB-D action recognition

Fast and Automatic Reconstruction of Semantically Rich 3D Indoor Maps from Low-quality RGB-D Sequences.

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

RGB-D Sequences Research Articles

Related Topics

Articles published on RGB-D Sequences

Estimating 6D Object Poses with Temporal Motion Reasoning for Robot Grasping in Cluttered Scenes

The Florence multi-resolution 3D facial expression dataset

SID-SLAM: Semi-Direct Information-Driven RGB-D SLAM

Dual-function depth camera array for inline 3D reconstruction of complex pipelines

First-Person Hand Action Recognition Using Multimodal Data

Robust and Accurate 3D Self-Portraits in Seconds

Volumetric Instance-Level Semantic Mapping Via Multi-View 2D-to-3D Label Diffusion

Spatial-Temporal Information Aggregation and Cross-Modality Interactive Learning for RGB-D-Based Human Action Recognition

ROSEFusion

Cross-Modality Compensation Convolutional Neural Networks for RGB-D Action Recognition

Segment spatial-temporal representation and cooperative learning of convolution neural networks for multimodal-based action recognition

Joint learning of convolution neural networks for RGB‐D‐based human action recognition

Motion segmentation of RGB-D sequences: Combining semantic and motion information using statistical inference.

Compositional Learning of Human Activities With a Self-Organizing Neural Architecture.

Learning and Tracking the 3D Body Shape of Freely Moving Infants from RGB-D sequences

NON-RIGID MULTI-BODY TRACKING IN RGBD STREAMS

Geometry-Aware ICP for Scene Reconstruction from RGB-D Camera

RGB-D action recognition based on discriminative common structure learning model

Collaborative multimodal feature learning for RGB-D action recognition

Fast and Automatic Reconstruction of Semantically Rich 3D Indoor Maps from Low-quality RGB-D Sequences.