The development of intelligent video surveillance systems is an area of active research, presenting solutions for use in specific environments. In addition, several problems have been formulated that need to be addressed. This is the problem of recognizing complex actions, which consist of sequences of elementary actions and, as a rule, are difficult to classify from a single frame of a video recording. The present study is devoted to solving the problem of recognizing complex actions on video recordings. The aim of the work is to develop a pipeline for recognizing complex actions that an observed object performs on video recordings. The novelty of the work lies in the approach to action modeling using sequences of elementary actions and a combination of neural networks and stochastic models. The proposed solution can be used to develop intelligent video surveillance systems to ensure security at production facilities, including oil and gas industry facilities. We analyzed video recordings of objects performing various actions. The features describing complex actions and their properties are singled out. The problem of recognition of complex actions represented by a sequence of elementary actions is formulated. As a result, we developed a pipeline implements a combined approach. Elementary actions are described using a skeletal model in graphical form. Each elementary action is recognized using a convolutional neural network, then complex actions are modeled using a hidden Markov model. The developed pipeline was tested on videos of students, whose actions were divided into two categories: cheating and ordinary actions. As a result of the experiments, the classification accuracy of elementary actions was 0.69 according to the accuracy metric, the accuracy of the binary classification of complex actions was 0.71. In addition, the constraints of the developed pipeline were indicated and further ways of enhancing the applied approaches were highlighted, in particular, the study of noise immunity.