Comparing Sampling Strategies for Tackling Imbalanced Data in Human Activity Recognition.

Fayez Alharbi,Jamie A Ward,Lahcen Ouarbya

doi:10.3390/s22041373

Fayez Alharbi, Jamie A Ward + Show 1 more

Open Access

https://doi.org/10.3390/s22041373

Copy DOI

Journal: Sensors	Publication Date: Feb 11, 2022
Citations: 14	License type: CC BY 4.0

Affiliation: Majmaah University, Goldsmiths University of London

Abstract

Human activity recognition (HAR) using wearable sensors is an increasingly active research topic in machine learning, aided in part by the ready availability of detailed motion capture data from smartphones, fitness trackers, and smartwatches. The goal of HAR is to use such devices to assist users in their daily lives in application areas such as healthcare, physical therapy, and fitness. One of the main challenges for HAR, particularly when using supervised learning methods, is obtaining balanced data for algorithm optimisation and testing. As people perform some activities more than others (e.g., walk more than run), HAR datasets are typically imbalanced. The lack of dataset representation from minority classes hinders the ability of HAR classifiers to sufficiently capture new instances of those activities. We introduce three novel hybrid sampling strategies to generate more diverse synthetic samples to overcome the class imbalance problem. The first strategy, which we call the distance-based method (DBM), combines Synthetic Minority Oversampling Techniques (SMOTE) with Random_SMOTE, both of which are built around the k-nearest neighbors (KNN). The second technique, referred to as the noise detection-based method (NDBM), combines SMOTE Tomek links (SMOTE_Tomeklinks) and the modified synthetic minority oversampling technique (MSMOTE). The third approach, which we call the cluster-based method (CBM), combines Cluster-Based Synthetic Oversampling (CBSO) and Proximity Weighted Synthetic Oversampling Technique (ProWSyn). We compare the performance of the proposed hybrid methods to the individual constituent methods and baseline using accelerometer data from three commonly used benchmark datasets. We show that DBM, NDBM, and CBM reduce the impact of class imbalance and enhance F1 scores by a range of 9–20 percentage point compared to their constituent sampling methods. CBM performs significantly better than the others under a Friedman test, however, DBM has lower computational requirements.

Highlights

Human activity recognition (HAR) using body-worn or wearable sensors is an active research topic in mobile and ubiquitous computing [1]
cluster-based method (CBM) performs significantly better than the others under a Friedman test, distance-based method (DBM) has lower computational requirements
We show that the sampling methods are only useful to improve the performance of the Multilayer perceptron (MLP) compared to the other classifiers for imbalanced human activity data

Summary

Introduction

Human activity recognition (HAR) using body-worn or wearable sensors is an active research topic in mobile and ubiquitous computing [1]. Activity recognition is a useful tool because it provides information on an individual’s behaviour that enables computing systems to monitor and to analyse and assist with a range of day-to-day tasks [2,3]. Most HAR studies adopt a supervised learning approach [4]. Supervised learning typically requires immense amounts of labelled sensor data in order to train [2]. For such models to work well, the data are ideally recorded from a variety of real-word situations. A diversity of sensor modalities and placements can help improve recognition performance [5,6]

Objectives

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Comparing Sampling Strategies for Tackling Imbalanced Data in Human Activity Recognition.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Sensors

Lead the way for us

Similar Papers

Optimal Feature Selection for Imbalanced Text Classification
Anshu Khurana ... Om Prakash Verma
IEEE Transactions on Artificial Intelligence | VOL. 4
Anshu Khurana, et. al.Anshu Khurana ... Om Prakash Verma
01 Feb 2023
IEEE Transactions on Artificial Intelligence | VOL. 4

Data oversampling and imbalanced datasets: an investigation of performance for machine learning and feature engineering
Muhammad Mujahid ... Furqan Rustam
Journal of Big Data | VOL. 11
Muhammad Mujahid, et. al.Muhammad Mujahid ... Furqan Rustam
17 Jun 2024
Journal of Big Data | VOL. 11

Instance weighted SMOTE by indirectly exploring the data distribution
Aimin Zhang ... Xibei Yang
Knowledge-Based Systems | VOL. 249
Aimin Zhang, et. al.Aimin Zhang ... Xibei Yang
04 May 2022
Knowledge-Based Systems | VOL. 249

Improving Fraud Detection in An Imbalanced Class Distribution Using Different Oversampling Techniques
Raneem Qaddoura ... Mariam M Biltawi
-
Raneem Qaddoura, et. al.Raneem Qaddoura ... Mariam M Biltawi
06 Nov 2022
06 Nov 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Comparing Sampling Strategies for Tackling Imbalanced Data in Human Activity Recognition.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Sensors