Abstract

Human-Object Interaction (HOI) recognition, due to its significance in many computer vision-based applications, requires in-depth and meaningful details from image sequences. Incorporating semantics in scene understanding has led to a deep understanding of human-centric actions. Therefore, in this research work, we propose a semantic HOI recognition system based on multi-vision sensors. In the proposed system, the de-noised RGB and depth images, via Bilateral Filtering (BLF), are segmented into multiple clusters using a Simple Linear Iterative Clustering (SLIC) algorithm. The skeleton is then extracted from segmented RGB and depth images via Euclidean Distance Transform (EDT). Human joints, extracted from the skeleton, provide the annotations for accurate pixel-level labeling. An elliptical human model is then generated via a Gaussian Mixture Model (GMM). A Conditional Random Field (CRF) model is trained to allocate a specific label to each pixel of different human body parts and an interaction object. Two semantic feature types that are extracted from each labeled body part of the human and labelled objects are: Fiducial points and 3D point cloud. Features descriptors are quantized using Fisher's Linear Discriminant Analysis (FLDA) and classified using K-ary Tree Hashing (KATH). In experimentation phase the recognition accuracy achieved with the Sports dataset is 92.88%, with the Sun Yat-Sen University (SYSU) 3D HOI dataset is 93.5% and with the Nanyang Technological University (NTU) RGB+D dataset it is 94.16%. The proposed system is validated via extensive experimentation and should be applicable to many computer-vision based applications such as healthcare monitoring, security systems and assisted living etc.

Highlights

  • Understanding Human-Object Interaction (HOI) is formulated on Human Action Recognition (HAR) [1]

  • Human body parts are modeled via Gaussian-based elliptical modeling and labelled at the pixel-level using Conditional Random Field (CRF)

  • I.e., fiducial points and cloud points, are extracted. These feature descriptors are optimized via Fisher’s Linear Discriminant Analysis (FLDA) and classified with a K-ary Tree Hashing (KATH) classifier

Read more

Summary

INTRODUCTION

Understanding Human-Object Interaction (HOI) is formulated on Human Action Recognition (HAR) [1]. In the proposed system, different human body parts along with their respective object are semantically segmented and labelled. In this way, movements performed by each body part are recorded individually resulting in the development of an accurate HOI recognition system. The proposed HOI recognition system consists of four major modules: image normalization, human and object segmentation, human body parts and object detection via elliptical modeling and pixel-level labeling, HOI interaction recognition via semantic feature extraction, dimensionality reduction and classification. Pixel-level labeling of each detected human body parts and object from both RGB and depth image sequences via CRF. The main contribution is accurate HOI detection via unique semantic feature extraction, from each labeled body part and object.

RELATED WORK
Compatibility transform
15 Right Knee arm
EXPERIMENTAL SETUP AND RESULTS
DATASETS DESCRIPTION The three datasets that are used for experimentation are
Proposed Method
Findings
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call