Human activity recognition (HAR) is an increasingly active study field within the computer vision community. In HAR, driver behavior can be detected to ensure safe travel. Detect driver behaviors using a capsule network with leave-one-subject-out validation. The study was done using CapsNet with leave-one-subject-out validation to identify driving habits. The proposed method in this study consists of two parts, namely encoder and decoder. The encoder used in this study modifies Sabour’s capsule network architecture by adding a convolution layer before going to the primary capsule layer. The proposed method is evaluated using a primary dataset with 10 classes and 300 images for each class. The dataset is split based on hold-out validation and leave-one-subject-out validation. The resulting models were then compared to conventional CNN architecture. The objective of the research is to identify driving behavior. In this study, the proposed method results an accuracy rate of 97.83 % in the split dataset using hold-out validation. However, the accuracy decreased by 53.11 % when the proposed method was used on a split dataset using leave-one-subject-out validation. This is because the proposed method extracts all features including the attributes of each participant contained in the input image (user-independent). Thus, the resulting model in this study tends to overfit.