Symmetrically Stacked Long Short-Term Memory Networks for Fall Event Recognition Using Compact Convolutional Neural Networks-Based Tracker

Nur Ayuni Mohamed,Nor Azwan Mohamed Kamari,Mohd Asyraf Zulkifley,Zulaikha Kadim

doi:10.3390/sym14020293

Nur Ayuni Mohamed, Nor Azwan Mohamed Kamari + Show 2 more

Open Access

https://doi.org/10.3390/sym14020293

Copy DOI

Journal: Symmetry	Publication Date: Feb 1, 2022
Citations: 3	License type: CC BY 4.0

Affiliation: National University of Malaysia, MIMOS (Malaysia)

Abstract

In recent years, the advancement of pattern recognition algorithms, specifically the deep learning-related techniques, have propelled a tremendous amount of researches in fall event recognition systems. It is important to detect a fall incident as early as possible, whereby a slight delay in providing immediate assistance can cause severe unrecoverable injuries. One of the main challenges in fall event recognition is the imbalanced training data between fall and no-fall events, where a real-life fall incident is a sporadic event that occurs infrequently. Most of the recent techniques produce a lot of false alarms, as it is hard to train them to cover a wide range of fall situations. Hence, this paper aims to detect the exact fall frame in a video sequence, as such it will not be dependent on the whole clip of the video sequence. Our proposed approach consists of a two-stage module where the first stage employs a compact convolutional neural network tracker to generate the object trajectory information. Features of interest will be sampled from the generated trajectory paths, which will be fed as the input to the second stage. The next stage network then models the temporal dependencies of the trajectory information using symmetrical Long Short-Term Memory (LSTM) architecture. This two-stage module is a novel approach as most of the techniques rely on the detection module rather than the tracking module. The simulation experiments were tested using Fall Detection Dataset (FDD). The proposed approach obtains an expected average overlap of 0.167, which is the best performance compared to Multi-Domain Network (MDNET) and Tree-structured Convolutional Neural Network (TCNN) trackers. Furthermore, the proposed 3-layers of stacked LSTM architecture also performs the best compared to the vanilla recurrent neural network and single-layer LSTM. This approach can be further improved if the tracker model is firstly pre-tuned in offline mode with respect to a specific type of object of interest, rather than a general object.

Highlights

We only focus on a vision-based system for fall event recognition
The MMCNN, Tree-structured Convolutional Neural Network (TCNN), and Multi-Domain Network (MDNET) trackers have been configured as model-free tracker as such the object appearance model will be learned using the first frame ground truth bounding box, Bgt
Our core approach relies on deep learning representations, where a fully CNNs tracker is employed to detect and track the object of interest in a video sequence

Summary

Introduction

Nowadays, advancements in both machine learning and computer vision fields have propelled the number of studies in activity recognition applications. Advancements in both machine learning and computer vision fields have propelled the number of studies in activity recognition applications These studies are mostly geared towards intelligent systems that focus on monitoring and analysis of daily human activities. Most of the studies have emphasized developing algorithms for classifying normal human activities, which is known as Activities of Daily Living (ADL), such as sitting, walking, standing, and crouching down. These algorithms are used in assessing the physical fitness and movement quality of an individual. Early detection of abnormal activities is indispensable to prevent any bad consequence due to delay in detecting the event

Objectives

Methods

Results

Conclusion