Action recognition (AR) has many applications, including surveillance, health/disabilities care, man-machine interactions, video-content-based monitoring, and activity recognition. Because human action videos contain a large number of frames, implemented models must minimize computation by reducing the number, size, and resolution of frames. We propose an improved method for detecting human actions in low-size and low-resolution videos by employing convolutional neural networks (CNNs) with channel attention mechanisms (CAMs) and autoencoders (AEs). By enhancing blocks with more representative features, convolutional layers extract discriminating features from various networks. Additionally, we use random sampling of frames before main processing to improve accuracy while employing less data. The goal is to increase performance while overcoming challenges such as overfitting, computational complexity, and uncertainty by utilizing CNN-CAM and AE. Identifying patterns and features associated with selective high-level performance is the next step. To validate the method, low-resolution and low-size video frames were used in the UCF50, UCF101, and HMDB51 datasets. Additionally, the algorithm has relatively minimal computational complexity. Consequently, the proposed method performs satisfactorily compared to other similar methods. It has accuracy estimates of 77.29, 98.87, and 97.16%, respectively, for HMDB51, UCF50, and UCF101 datasets. These results indicate that the method can effectively classify human actions. Furthermore, the proposed method can be used as a processing model for low-resolution and low-size video frames.