Abstract

This work proposes a novel unsupervised self-organizing network, called the Self-Organizing Convolutional Echo State Network (SO-ConvESN), for learning node centroids and interconnectivity maps compatible with the deterministic initialization of Echo State Network (ESN) input and reservoir weights, in the context of human action recognition (HAR). To ensure stability and echo state property in the reservoir, Recurrent Plots (RPs) and Recurrence Quantification Analysis (RQA) techniques are exploited for explainability and characterization of the reservoir dynamics and hence tuning ESN hyperparameters. The optimized self-organizing reservoirs are cascaded with a Convolutional Neural Network (CNN) to ensure that the activation of internal echo state representations (ESRs) echoes similar topological qualities and temporal features of the input time-series, and the CNN efficiently learns the dynamics and multiscale temporal features from the ESRs for action recognition. The hyperparameter optimization (HPO) algorithms are additionally adopted to optimize the CNN stage in SO-ConvESN. Experimental results on the HAR problem using several publicly available 3D-skeleton-based action datasets demonstrate the showcasing of the RPs and RQA technique in examining the explainability of reservoir dynamics for designing stable self-organizing reservoirs and the usefulness of implementing HPOs in SO-ConvESN for the HAR task. The proposed SO-ConvESN exhibits competitive recognition accuracy.

Highlights

  • Human action recognition (HAR) has been an active research field to interpret human intentions

  • It lacks of explainability consideration to understand the input-dependent reservoir dynamics for HAR

  • In this work we propose a novel reservoir design approach known as the Self-Organizing Reservoir Network with Explainability (SORN-E) which is characterised by (i) the integration of Adaptive Resonance Theory (ART) [16] architecture and topology construction based on Instantaneous Topological Mapping (ITM) [17] for the self-organization of the input weights and reservoir weights, and ii) hyperparameter tuning based on the explainability of self-organizing reservoir through Recurrent Plots (RPs) and Recurrence Quantification Analysis (RQA) technique [18]

Read more

Summary

Introduction

Human action recognition (HAR) has been an active research field to interpret human intentions. Fixed neuron weights may diversify the recognition performance of ESN-based approaches even in performing the same task with the identical set of hyperparameter configurations [13] It hardly reproduces the same performance due to the randomized input and reservoir weights in different repeated runs. The first dataset is MSRA3D, composed of 567 sequences with 23,797 skeleton frames recorded at 15 fps with each action performed by ten different subjects 2 or 3 times It is one of the most famous HAR benchmark datasets used by researchers, which employed Kinect-like sensor to acquire 20 skeleton joints for 20 different activities: higharm wave, horizontal arm wave, hammer, hand catch, forward punch, high throw, draw X, draw tick, draw a circle, hand clap, two-hand wave, side boxing, bend, forward kick, side kick, jogging, tennis swing, tennis serve, golf swing, and pick-up and throw. This high intraclass variation makes the recognition task more challenging

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call