Reliable recognition of human movements has a wide range of applications, including games, human-computer interaction, security and healthcare. In recent years, computer graphics and computer vision researchers have developed plenty of new motion-capture algorithms that operate on simpler hardware and with far fewer limitations than before. The objective of this paper is to improve the accuracy of estimation of human skeleton joints positions in video sequences. Particularly the proposed method in this paper consists of five blocks. The input sequence of images is fed to the tracking and motion compensation unit where the tracking algorithm determines the object displacement and centers it within the frame. The motion information is also propagated to the additional unit of point-of-view estimation. This unit calculates the motion angles in the frame and monitors the object size, thus determining whether the object is approaching or moving away from the camera, and then feeds these data to the neural network. The network consists of three convolutional layers. Each convolutional layer is followed by a pooling layer. The last pooling layer connects to the cascade of three fully connected layers. All activation functions in these layers are the ReLU ones, except the last layer, where the linear activation is used. The HOG3D features treated as the input of the first convolutional layer. The data from the point-of-view, tracking and motion compensation unit goes directly to the input of fully connected layers. To cope with inaccurate or undetected joints positions, the method uses the additional procedure, which determines unreliable joints and extrapolates their new positions from the previous ones using the additional neural network. It is assumed that this structure of the method improves the position prediction accuracy due to the following reasons: taking into account the information about motion angles and zooming allows to distinguish movements that are similar in centered frames but different in displacement; using of adaptive window size for HOG3D features; using the neural network to extrapolate the positions of joints in case of absence of the prediction or in case of its low accuracy. Experiments on the HumanEva-1 dataset confirmed that the suggested modifications permit achieving higher accuracies, and thus the prospect of the use of proposed modified method to predict the body position in motion recognition systems.