Abstract

Kinect, as a 3D digital capturing device, can collect the RGB and depth information of human activities rapidly. We study fusing the depth and RGB information for activity recognition. We introduce histogram color-based image thresholding to detect skin on human body, and use a GMM model to segment human hand areas. We design a new local descriptor, called a 3D Motion Scale-Invariant Feature Transform (3D MoSIFT), which can effectively detect interesting points based on both RGB and depth information, and consequently encode the visual and motion information from both to describe the interesting points. Experiments, based on a video dataset collected by a Kinect camera, show that adding depth information in the descriptor can distinctly improve the accuracy of human activity recognition. We introduce the F1-score measurement to evaluate and compare our performance with the other algorithms.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.