Structure-Preserving Binary Representations for RGB-D Action Recognition.

Mengyang Yu,Ling Shao,Li Liu

doi:10.1109/tpami.2015.2491925

Mengyang Yu, Ling Shao + Show 1 more

Open Access

https://doi.org/10.1109/tpami.2015.2491925

Copy DOI

Abstract

In this paper, we propose a novel binary local representation for RGB-D video data fusion with a structure-preserving projection. Our contribution consists of two aspects. Toacquire a general feature for the video data, we convert the problem to describing the gradient fields of RGB and depth information of video sequences. With the local fluxes of the gradient fields, which include the orientation and the magnitude of the neighborhood of each point, a new kind of continuous local descriptor called Local Flux Feature(LFF) is obtained. Then the LFFs from RGB and depth channels are fused into a Hamming space via the Structure Preserving Projection (SPP). Specifically, an orthogonal projection matrix is applied to preserve the pairwise structure with a shape constraint to avoid the collapse of data structure in the projected space. Furthermore, a bipartite graph structure of data is taken into consideration, which is regarded as a higher level connection between samples and classes than the pairwise structure of local features. Theextensive experiments show not only the high efficiency of binary codes and the effectiveness of combining LFFs from RGB-D channels via SPP on various action recognition benchmarks of RGB-D data, but also the potential power of LFF for general action recognition.

Highlights

R GB-D sensors such as Kinect receive increasing attention in the computer vision community [1]
All these sequences are synchronously captured with a Kinect sensor. This dataset collects 10 categories of hand gestures in total: circle, triangle, up-down, right-left, wave, “Z”, cross, comehere, turnaround and pat. All these ten categories are performed with three hand postures: fist, index and flat
We illustrate the effectiveness of all the three terms used in Structure Preserving Projection (SPP), i.e., the pairwise label preserving term, the pairwise angle preserving term and the bigraph regularization

Summary

INTRODUCTION

R GB-D sensors such as Kinect receive increasing attention in the computer vision community [1]. To gain a more robust and accurate representation of samples, local feature descriptors such as: SIFT [8], HOG3D [9], HOG [10], HOF [11] and MBH [12] have been proposed and achieved notable success in classification and recognition Based on these local features, the Bag-of-Words (BoW) model [13] and the Sparse Coding (SC) algorithm [14] have shown their effectiveness for both image classification and action recognition. It represents the sum of all distances from the local features of an image to their corresponding nearest neighbors in each class It was proposed for image classification, it can be applied to any kind of samples represented by local feature descriptors. This makes it extremely fast and useful for many practical applications

RELATED WORK

LOCAL FLUX FEATURE

Flux Computation

Pairwise Structure Preserving

STRUCTURE PRESERVING PROJECTION

Pairwise Angle

Bigraph Regularization

Objective Function and Optimization

Complexity Analysis

EXPERIMENTS AND RESULTS

Datasets and Settings

Compared Results

Comparison with Other Methods

Methods

Statistical Significance Test

Results on RGB Video dataset

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Transactions on Pattern Analysis and Machine Intelligence	Publication Date: Oct 16, 2015
Citations: 182	License type: CC BY 3.0

R Discovery Prime

R Discovery Prime

Structure-Preserving Binary Representations for RGB-D Action Recognition.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Transactions on Pattern Analysis and Machine Intelligence

Lead the way for us

Similar Papers

AMIR
Shinan Liu ... Tarun Mangla
Proceedings of the ACM on interactive, mobile, wearable and ubiquitous technologies | VOL. 7
Shinan Liu, et. al.Shinan Liu ... Tarun Mangla
27 Mar 2023
Proceedings of the ACM on interactive, mobile, wearable and ubiquitous technologies | VOL. 7

SO(3)‐Pose: SO(3)‐Equivariance Learning for 6D Object Pose Estimation
Haoran Pan ... Xuequan Lu
Computer Graphics Forum | VOL. 41
Haoran Pan, et. al.Haoran Pan ... Xuequan Lu
01 Oct 2022
Computer Graphics Forum | VOL. 41

RGB-Depth feature for 3D human activity recognition
Zhao Yang ... Cheng Hong
China Communications | VOL. 10
Zhao Yang, et. al. Zhao Yang ... Cheng Hong
01 Jul 2013
China Communications | VOL. 10

3D human behavior recognition based on spatiotemporal texture features
Chunxiao Fan ... Yue Ming
-
Chunxiao Fan, et. al. Chunxiao Fan ... Yue Ming
01 Jun 2015
01 Jun 2015

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Structure-Preserving Binary Representations for RGB-D Action Recognition.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Transactions on Pattern Analysis and Machine Intelligence