Temporal action detection based on two-stream You Only Look Once network for elderly care service robot

Ke Wang,Jun Wu,Ruifeng Li,Jianhua Yang,Xuejing Li

doi:10.1177/17298814211038342

Ke Wang, Jun Wu + Show 3 more

Open Access

https://doi.org/10.1177/17298814211038342

Copy DOI

Abstract

Human action segmentation and recognition from the continuous untrimmed sensor data stream is a challenging issue known as temporal action detection. This article provides a two-stream You Only Look Once-based network method, which fuses video and skeleton streams captured by a Kinect sensor, and our data encoding method is used to turn the spatiotemporal temporal action detection into a one-dimensional object detection problem in constantly augmented feature space. The proposed approach extracts spatial–temporal three-dimensional convolutional neural network features from video stream and view-invariant features from skeleton stream, respectively. Furthermore, these two streams are encoded into three-dimensional feature spaces, which are represented as red, green, and blue images for subsequent network input. We proposed the two-stream You Only Look Once-based networks which are capable of fusing video and skeleton information by using the processing pipeline to provide two fusion strategies, boxes-fusion or layers-fusion. We test the temporal action detection performance of two-stream You Only Look Once network based on our data set High-Speed Interplanetary Tug/Cocoon Vehicles-v1, which contains seven activities in the home environment and achieve a particularly high mean average precision. We also test our model on the public data set PKU-MMD that contains 51 activities, and our method also has a good performance on this data set. To prove that our method can work efficiently on robots, we transplanted it to the robotic platform and an online fall down detection experiment.

Highlights

In modern society, more and more people spend their elderly life living alone
We propose a novel idea that innovatively transforms the temporal action detection (TAD) problem into a 1D object detection problem based on a two-stream YOLO-based deep learning network that fuses the encoded video and skeleton stream of Kinect sensor
We proposed a TAD deep learning network, which incorporates video and skeleton information from Kinect sensor

Summary

Introduction

More and more people spend their elderly life living alone. These phenomena may cause a potential risk due to the inadequate ability of the elder people to take good care of themselves. In skeleton-stream, we apply view-invariant transform on skeleton sequence to extract view-invariant feature (VIF), which may eliminate view change influence and retain more relative motions than some traditional skeleton-based methods.[18] The skeleton VIF sequence is encoded into color map whose width indicates the time line and height is equal to the number of skeletons, preserving the temporal information and the spatial information of original action. YOLO-based network that provides two information fusion strategies, boxes-fusion and layers-fusion, for TAD of color maps encoded from two input streams. We propose a novel idea that innovatively transforms the TAD problem into a 1D object detection problem based on a two-stream YOLO-based deep learning network that fuses the encoded video and skeleton stream of Kinect sensor.

Related works

Experiments

Methods

Conclusion

Findings

Declaration of conflicting interests

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: International Journal of Advanced Robotic Systems	Publication Date: Jul 1, 2021
Citations: 6	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Temporal action detection based on two-stream You Only Look Once network for elderly care service robot

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Advanced Robotic Systems

Lead the way for us

Similar Papers

Two-Stream Region Convolutional 3D Network for Temporal Activity Detection.
Huijuan Xu ... Abir Das
IEEE Transactions on Pattern Analysis and Machine Intelligence | VOL. 41
Huijuan Xu, et. al.Huijuan Xu ... Abir Das
07 Jun 2019
IEEE Transactions on Pattern Analysis and Machine Intelligence | VOL. 41

Temporal Action Detection Based on Temporal Convolutional Network with Coarse-Grained Clips Construction
Yiheng Cai ... Xinran Kong
-
Yiheng Cai, et. al.Yiheng Cai ... Xinran Kong
01 Nov 2019
01 Nov 2019

Temporal Convolutional Attention Neural Networks for Time Series Forecasting
Yang Lin ... Irena Koprinska
-
Yang Lin, et. al.Yang Lin ... Irena Koprinska
18 Jul 2021
18 Jul 2021

Temporal convolutional network with soft threshold and contractile self-attention mechanism for remaining useful life prediction of rolling bearings
Hao Ma ... Huaiqian Bao
Measurement Science and Technology | VOL. 35
Hao Ma, et. al.Hao Ma ... Huaiqian Bao
05 Sep 2024
Measurement Science and Technology | VOL. 35

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Temporal action detection based on two-stream You Only Look Once network for elderly care service robot

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Advanced Robotic Systems