Recently, deep neural networks are used to recognize human activity/gait through mobile sensors which have attracted a great attention. Although the existing deep neural networks that perform automatic feature extraction have achieved desirable performance, their classification accuracy needs to be improved. In this paper, a deep neural network that combines a set of convolutional layers and capsule network is proposed. The proposed architecture named DCapsNet is suited to automatically extract the activity or gait features through built in sensors and classify them. The convolutional layers of the DCapsNet are more suitable for processing temporal sequences and provide scalar outputs but not the equivariance. The capsule network (CapsNet) is then trained by a dynamic routing algorithm to capture the equivariance having a magnitude and orientation, which increases the efficiency of the model classification. The performance of the proposed model is evaluated on four public datasets: two HAR datasets (UCI-HAR and WISDM) and two gait datasets (WhuGAIT). The recognition accuracy of the proposed model for the UCI-HAR and WISDM datasets are 97.92 % and 99.30 %, respectively, and for the WhuGAIT Dataset #1 and Dataset #2 are 94.75 % and 97.16 %, respectively. Experimental results show that the proposed model achieves the highest recognition accuracy over the reported results of the state-of-the-art models.