Abstract
3D depth data, especially dynamic 3D depth data, offer several advantages over traditional intensity videos for expressing objects׳ actions, such as being useful in low light levels, resolving the silhouette ambiguity of actions, and being color and texture invariant. With the wide popularity of somatosensory equipment (Kinect for example), more and more dynamic 3D depth data are shared on the Internet, which results in an urgent need to retrieve these data efficiently and effectively. In this paper, we propose a generalized strategy for dynamic 3D depth data matching and apply this strategy in action retrieval task. Firstly, an improved 3D shape context descriptor (3DSCD) is proposed to extract features of each static depth frame. Then we employ dynamic time warping (DTW) to measure the temporal similarity between two 3D dynamic depth sequences. Experimental results on our collected dataset consisting of 170 dynamic 3D depth video clips show that the proposed 3DSCD has a rich descriptive power on depth data and that the method using 3DSCD and DTW achieves high matching accuracy. Finally, to address the matching efficiency problem, we utilize the bag of word (BoW) model to quantize the 3DSCD of each static depth frame into visual word packages. So the original feature matching problem is simplified into a two-histogram matching problem. The results demonstrate the matching efficiency of our proposed method, while still maintaining high matching accuracy.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.