In recent years, Human activity recognition (HAR) based on wearable devices has been widely applied in health applications and other fields. Currently, most HAR models are based on the Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM), or their combination. Recently, there have been proposals based on Transformer and its variant models. However, due to the fact that these models have sequential network structures and are unable to simultaneously focus on local and global features, thus, resulting in a reduction in recognition performance. In addition, along with the substantial computational resources required by Transformers, they are not suitable for resource-constrained devices. In this paper, the primary distinction of our proposed model from other hybrid models that combine CNN and Transformer is that our model adopts a completely new parallel network architecture and primarily focuses on lightweight design. Particularly, We proposed the Mobile Human Activity Recognition Conformer (MobileHARC), which adopts the parallel structure with a lightweight Transformer and CNN as the backbone networks. Furthermore, we proposed the Inverted Residual Lightweight Convolution Block and Multiscale Lightweight Multi-Head Self-Attention Mechanism. We systematically evaluate the proposed models on four public datasets. Experimental results show that MobileHARC achieves superior recognition performance, and uses fewer Floating-Point Operations per Second (FLOPs) and parameters compared to current models.