Activity Recognition Research Articles

Non-nutritive sucking (NNS), which refers to the act of sucking on a pacifier, finger, or similar object without nutrient intake, plays a crucial role in assessing healthy early development. In the case of preterm infants, NNS behavior is a key component in determining their readiness for feeding. In older infants, the characteristics of NNS behavior offer valuable insights into neural and motor development. Additionally, NNS activity has been proposed as a potential safeguard against sudden infant death syndrome (SIDS). However, the clinical application of NNS assessment is currently hindered by labor-intensive and subjective finger-in-mouth evaluations. Consequently, researchers often resort to expensive pressure transducers for objective NNS signal measurement. To enhance the accessibility and reliability of NNS signal monitoring for both clinicians and researchers, we introduce a vision-based algorithm designed for non-contact detection of NNS activity using baby monitor footage in natural settings. Our approach involves a comprehensive exploration of optical flow and temporal convolutional networks, enabling the detection and amplification of subtle infant-sucking signals. We successfully classify short video clips of uniform length into NNS and non-NNS periods. Furthermore, we investigate manual and learning-based techniques to piece together local classification results, facilitating the segmentation of longer mixed-activity videos into NNS and non-NNS segments of varying duration. Our research introduces two novel datasets of annotated infant videos, including one sourced from our clinical study featuring 18 infant subjects and 183 h of overnight baby monitor footage. Additionally, we incorporate a second, shorter dataset obtained from publicly available YouTube videos. Our NNS action recognition algorithm achieves an impressive 95.8% accuracy in binary classification, based on 960 2.5-s balanced NNS versus non-NNS clips from our clinical dataset. We also present results for a subset of clips featuring challenging video conditions. Moreover, our NNS action segmentation algorithm achieves an average precision of 93.5% and an average recall of 92.9% across 30 heterogeneous 60-s clips from our clinical dataset.

Large labeled datasets are crucial for video understanding progress. However, the labeling process is time-consuming, expensive, and tiresome. To overcome this impediment, various pretexts use the temporal coherence in videos to learn visual representations in a self-supervised manner. However, these pretexts (order verification and sequence sorting) struggle when encountering cyclic actions due to the label ambiguity problem. To overcome these limitations, we present a novel temporal pretext task to address self-supervised learning of visual representations from unlabeled videos. Repeated Scene Localization (RSL) is a multi-class classification pretext that involves changing the temporal order of the frames in a video by repeating a scene. Then, the network is trained to identify the modified video, localize the location of the repeated scene, and identify the unmodified original videos that do not have repeated scenes. We evaluated the proposed pretext on two benchmark datasets, UCF-101 and HMDB-51. The experimental results show that the proposed pretext achieves state-of-the-art results in action recognition and video retrieval tasks. In action recognition, our S3D model achieves 88.15% and 56.86% on UCF-101 and HMDB-51, respectively. It outperforms the current state-of-the-art by 1.05% and 3.26%. Our R(2+1)D-Adjacent model achieves 83.52% and 54.50% on UCF-101 and HMDB-51, respectively. It outperforms the single pretext tasks by 8.7% and 13.9%. In video retrieval, our R(2+1)D-Offset model outperforms the single pretext tasks by 4.68% and 1.1% Top 1 accuracies on UCF-101 and HMDB-51, respectively. The source code and the trained models are publicly available at https://github.com/Hussein-A-Hassan/RSL-Pretext.

Activity Recognition Research Articles

Related Topics

Articles published on Activity Recognition

Active sonar target recognition method based on multi‐domain transformations and attention‐based fusion network

Attention-guided mask learning for self-supervised 3D action recognition

Subtle signals: Video-based detection of infant non-nutritive sucking as a neurodevelopmental cue

Home Activity Recognition for Rural Elderly Based on Deep Learning and Smartphone Sensors

Repeat and learn: Self-supervised visual representations learning by Repeated Scene Localization

Full wireless goniometer design with activity recognition for upper and lower limb

Fall Detection Method for Infrared Videos Based on Spatial-Temporal Graph Convolutional Network.

Progressively-orthogonally-mapped EfficientNet for action recognition on time-range-Doppler signature

Enhancing Human Activity Recognition through Integrated Multimodal Analysis: A Focus on RGB Imaging, Skeletal Tracking, and Pose Estimation.

Volleyball training video classification description using the BiLSTM fusion attention mechanism

Human action recognition using an optical flow-gated recurrent neural network

FMCW Radar Human Action Recognition Based on Asymmetric Convolutional Residual Blocks.

Exploring global context and position-aware representation for group activity recognition

Radar Feature Analysis of Human Activity Recognition Under Multiview Scenes

Sensors-Based Human Activity Recognition Using Hybrid Features and Deep Capsule Network

Complex Human Activity Recognition Based on Spatial LSTM and Deep Residual Convolutional Network Using Wearable Motion Sensors

A Data-Driven Feature Extraction Method Based on Data Supplement for Human Activity Recognition

Convolutional Block Attention Module-Multimodal Feature-Fusion Action Recognition: Enabling Miner Unsafe Action Recognition.

ESC-ZSAR: Expanded Semantics from Categories with Cross-Attention for Zero-Shot Action Recognition

Semi-supervised human action recognition via dual-stream cross-fusion and class-aware memory bank

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Activity Recognition Research Articles

Related Topics

Articles published on Activity Recognition

Active sonar target recognition method based on multi‐domain transformations and attention‐based fusion network

Attention-guided mask learning for self-supervised 3D action recognition

Subtle signals: Video-based detection of infant non-nutritive sucking as a neurodevelopmental cue

Home Activity Recognition for Rural Elderly Based on Deep Learning and Smartphone Sensors

Repeat and learn: Self-supervised visual representations learning by Repeated Scene Localization

Full wireless goniometer design with activity recognition for upper and lower limb

Fall Detection Method for Infrared Videos Based on Spatial-Temporal Graph Convolutional Network.

Progressively-orthogonally-mapped EfficientNet for action recognition on time-range-Doppler signature

Enhancing Human Activity Recognition through Integrated Multimodal Analysis: A Focus on RGB Imaging, Skeletal Tracking, and Pose Estimation.

Volleyball training video classification description using the BiLSTM fusion attention mechanism

Human action recognition using an optical flow-gated recurrent neural network

FMCW Radar Human Action Recognition Based on Asymmetric Convolutional Residual Blocks.

Exploring global context and position-aware representation for group activity recognition

Radar Feature Analysis of Human Activity Recognition Under Multiview Scenes

Sensors-Based Human Activity Recognition Using Hybrid Features and Deep Capsule Network

Complex Human Activity Recognition Based on Spatial LSTM and Deep Residual Convolutional Network Using Wearable Motion Sensors

A Data-Driven Feature Extraction Method Based on Data Supplement for Human Activity Recognition

Convolutional Block Attention Module-Multimodal Feature-Fusion Action Recognition: Enabling Miner Unsafe Action Recognition.

ESC-ZSAR: Expanded Semantics from Categories with Cross-Attention for Zero-Shot Action Recognition

Semi-supervised human action recognition via dual-stream cross-fusion and class-aware memory bank