Sensors incorporated in devices are a source of temporal data that can be interpreted to learn the context of a user. The smartphone accelerometer sensor generates data streams that form distinct patterns in response to user activities. The human context can be predicted using deep learning models built from raw sensor data or features retrieved from raw data. This study analyzes data streams from the UCI-HAR public dataset for activity recognition to determine 31 handcrafted features in the temporal and frequency domain. Various stacked and combination RNN models, trained with attention mechanisms, are designed to work with computed features. Attention gave the models a good fit. When trained with all features, the two-stacked GRU model performed best with 99% accuracy. Selecting the most promising features helps reduce training time without compromising accuracy. The ranking supplied by the permutation feature importance measure and Shapley values are utilized to identify the best features from the highly correlated features. Models trained using optimal features, as determined by the importance measures, had a 96% accuracy rate. Misclassification in attention-based classifiers occurs in the prediction of dynamic activities, such as walking upstairs and walking downstairs, and in sedentary activities, such as sitting and standing, due to the similar range of each activity’s axis values. Our research emphasizes the design of streamlined neural network architectures, characterized by fewer layers and a reduced number of neurons when compared to existing models in the field, to design lightweight models to be implemented in resource-constraint gadgets.
Read full abstract