The research in this paper mainly includes as follows: for the principle of action recognition based on the 3D diffusion model convolutional neural network, the whole detection process is carried out from fine to coarse using a bottom-up approach; for the human skeleton detection accuracy, a multibranch multistage cascaded CNN structure is proposed, and this network structure enables the model to learn the relationship between the joints of the human body from the original image and effectively predict the occluded parts, allowing simultaneous prediction of skeleton point positions and skeleton point association information on the one hand, and refinement of the detection results in an iterative manner on the other. For the combination problem of discrete skeleton points, it is proposed to take the limb parts formed between skeleton points as information carriers, construct the skeleton point association information model using vector field, and consider it as a feature, to obtain the relationship between different skeleton points by using the detection method. It is pointed out that the reorganization problem of discrete skeleton points in multiperson scenes is an NP-Hard problem, which can be simplified by decomposing it into a set of subproblems of bipartite graph matching, thus proposing a matching algorithm for discrete skeleton points and optimizing it for the skeleton dislocation and algorithm problems of human occlusion. Compared with traditional two-dimensional images, audio, video, and other multimedia data, the 3D diffusion model data describe the 3D geometric morphological information of the target scene and are not affected by lighting changes, rotation, and scale transformation of the target and thus can describe the realistic scene more comprehensively and realistically. With the continuous updating of diffusion model acquisition equipment, the rapid development of 3D reconstruction technology, and the continuous enhancement of computing power, the research on the application of 3D diffusion model in the detection and extraction of a human skeleton in sports dance videos has become a hot direction in the field of computer vision and computer graphics. Among them, the feature detection description and model alignment of 3D nonrigid models are a fundamental problem with very important research value and significance and challenging at the same time, which has received wide attention from the academic community.
Read full abstract