With their automatic feature extraction capabilities, deep learning models have become more widespread in sensor-based human activity recognition, particularly on larger datasets. However, their direct use on mobile and wearable devices is challenging due to the extensive resource requirements. Concurrently, attention-based models are emerging to improve recognition performance by dynamically emphasizing relevant parts of features and disregarding the irrelevant ones, particularly in the computer vision domain. This study introduces a novel application of attention mechanisms to smaller deep architectures, investigating whether smaller models can achieve comparable recognition performance to larger models in sensor-based human activity recognition systems while keeping resource usage at lower levels. For this purpose, we integrate the convolutional block attention module into a hybrid model, deep convolutional and long short-term memory network. Experiments are conducted using five public datasets in three model sizes: lightweight, moderate and original. The results show that applying attention to the lightweight model enables achieving similar recognition performances to the moderate-size model, and the lightweight model requires approximately 2–13 times fewer parameters and 3.5 times fewer flops. We also conduct experiments with sensor data at lower sampling rates and from fewer sensors attached to different body parts. The results show that attention improves recognition performance under lower sampling rates, as well as under higher sampling rates when model sizes are smaller, and mitigates the impact of missing data from one or more body parts, making the model more suitable for real-world sensor-based applications.