The paper presents an approach to recognizing human actions using an additional preprocessing stage of input data. The growing volumes of video information do not always allow support the quality of data at a high level; this can cause limitations in the further processing of digital data. In this regard, it becomes urgent to introduce an additional stage of image enhancement into the algorithm for recognizing actions in video. The proposed method includes three main steps: image enhancement, constructing a descriptor, and classification. The presented image enhancement stage is based on the combined local and global image processing in the frequency domain. The basic idea in using local alfa-rooting method is to apply it to different disjoint blocks with different sizes. To solve the problem of constructing a descriptor, a three-dimensional microblock dense difference (3D DMD) algorithm is used, which provides a highly oriented representation of image regions by tightly capturing microblocks within each region in several orientations and scales. 3D DMD has several advantages over other methods: higher efficiency compared to existing methods; minimal computational costs when using an integrated image; low dimension; ease of implementation; does not require settings. The presented modification allows to increase productivity by 2-4%.