Abstract

We present an effective framework to utilize 3D depth data for action recognition, called TriViews. It projects the 3D depth maps into three views, i.e., front, side, and top views. Under this framework, features can be extracted from each view, separately. Then the three views are combined to derive a complete description of the 3D data. To study the effectiveness and usefulness of the TriViews framework, we extract five different features, i.e., spatiotemporal interest points (STIP), dense trajectory shape (DT-Shape), dense trajectory motion boundary histograms (DT-MBH), skeleton trajectory shape (ST-Shape), and skeleton trajectory motion boundary histograms (ST-MBH). The first three features are representative for actions in intensity data but adapted to depth sequences. The last two are proposed by us, termed as skeleton-based features unique for 3D depth data. The RGB-D sensors, e.g., the Kinect, provide 3D positions of 20 skeleton joints and the evolution of each skeleton joint over time corresponds to one skeleton trajectory. Features aligned with the skeleton trajectory include shape descriptor (ST-Shape) and motion boundary histograms (ST-MBH), extracted to characterize the actions with sparse trajectories. The five features characterize action patterns from different aspects, among which the top three best features are selected and fused based on a probabilistic fusion approach (PFA). We evaluate the proposed framework on three challenging depth action datasets. The experimental results show that the proposed TriViews framework achieves the most accurate results for depth-based action recognition, better than the state-of-the-art methods on all three databases.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call