Abstract

Human Activity Recognition (HAR) is a field with many contrasting application domains, from medical applications to ambient assisted living and sports applications. With ever-changing use cases and devices also comes a need for newer and better HAR approaches. Machine learning has long been one of the predominant techniques to recognize activities from extracted features. With the advent of deep learning techniques that push state of the art results in many different domains like natural language processing or computer vision, researchers have also started to build deep neural nets for HAR. With this increase in complexity, there also comes a necessity to compare the newer approaches to the previous state of the art algorithms. Not everything that is new is also better. Therefore, this paper aims to compare typical machine learning models like a Random Forest (RF) or a Support Vector Machine (SVM) to two commonly used deep neural net architectures, Convolutional Neural Nets (CNNs) and Recurrent Neural Nets (RNNs). Not only in regards to performance but also in regards to the complexity of the models. We measure complexity as the memory consumption, the mean prediction time and the number of trainable parameters of the models. To achieve comparable results, the models are all tested on the same publicly available dataset, the UCI HAR Smartphone dataset. With this combination of prediction performance and model complexity, we look for the models achieving the best possible performance/complexity tradeoff and therefore being the most favourable to be used in an application. According to our findings, the best model for a strictly memory limited use case is the Random Forest with an F1-Score of 88.34%, memory consumption of only 0.1 MB and mean prediction time of 0.22 ms. The overall best model in terms of complexity and performance is the SVM with a linear kernel with an F1-Score of 95.62%, memory consumption of 2 MB and a mean prediction time of 0.47 ms. The two deep neural nets are on par in terms of performance, but their increased complexity makes them less favourable to be used.

Highlights

  • IntroductionDue to the amount of possibilities where Human Activity Recognition (HAR) can be applied, it is a heavily researched field, with application scenarios ranging from medical applications, ambient assisted living, sports and leisure, tele-immersion to security surveillance

  • As the number of trainable parameters is a metric typically only used to compare deep neural nets, we aim to introduce formulas for the approximation of the trainable parameters for the used machine learning models and find a metric that can give an initial indication on the relative complexity across model types

  • The first is that the best performing model in terms of F1-Score or accuracy is not one of the deep neural nets, but one of the machine learning models, the Support Vector Machine (SVM) with radial basis function (RBF) kernel with an F1-Score of 96.02% followed closely by the SVM with linear kernel with 95.62%

Read more

Summary

Introduction

Due to the amount of possibilities where Human Activity Recognition (HAR) can be applied, it is a heavily researched field, with application scenarios ranging from medical applications, ambient assisted living, sports and leisure, tele-immersion to security surveillance. With these contrasting use cases come very specific requirements that introduce the need for very specific approaches. The use case of a security surveillance system used in a public place to recognize criminal activities comes with its inherent need for a vision-based approach as there is no possibility to equip any of the subjects with sensors. With some of the applications evolving, new use cases being created and new devices being introduced, the need for ever evolving HAR approaches arises

Objectives
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call