Abstract

This paper provides an in-depth study and analysis of human artistic poses through intelligently enhanced multimodal artistic pose recognition. A complementary network model architecture of multimodal information based on motion energy proposed. The network exploits both the rich information of appearance features provided by RGB data and the depth information provided by depth data as well as the characteristics of robustness to luminance and observation angle. The multimodal fusion is accomplished by the complementary information characteristics of the two modalities. Moreover, to better model the long-range temporal structure while considering action classes with sub-action sharing phenomena, an energy-guided video segmentation method is employed. And in the feature fusion stage, a cross-modal cross-fusion approach is proposed, which enables the convolutional network to share local features of two modalities not only in the shallow layer but also to obtain the fusion of global features in the deep convolutional layer by connecting the feature maps of multiple convolutional layers. Firstly, the Kinect camera is used to acquire the color image data of the human body, the depth image data, and the 3D coordinate data of the skeletal points using the Open pose open-source framework. Then, the action automatically extracted from keyframes based on the distance between the hand and the head, and the relative distance features are extracted from the keyframes to describe the action, the local occupancy pattern features and HSV color space features are extracted to describe the object, and finally, the feature fusion is performed and the complex action recognition task is completed. To solve the consistency problem of virtual-reality fusion, the mapping relationship between hand joint point coordinates and the virtual scene is determined in the augmented reality scene, and the coordinate consistency model of natural hand and virtual model is established; finally, the real-time interaction between hand gesture and virtual model is realized, and the average correct rate of its hand gesture reaches 99.04%, which improves the robustness and real-time interaction of hand gesture recognition.

Highlights

  • With the development and application of artificial intelligence, machine vision, human-computer interaction, and other technologies, computers are rapidly integrated into our lives and gradually change our lifestyles, in which human-computer interaction (HCI) technology, which allows close contact between humans and machines, continues to emerge as innovative and novel, whichHuman Intelligence Enhancement serves as a special language for communication between humans and computers, such as body movements, facial expressions, etc., to express specific interaction intentions and achieve the exchange of information between humans and computers (Abdullah et al, 2021)

  • Five different wraparound feature selection methods are used to select five optimal feature subsets from the original feature set, respectively, from the grain computing perspective, and five homogeneous classifiers are trained on these five optimal feature subsets to merge into a local set, and a total of seven heterogeneous classification algorithms are used to construct seven local sets to merge into a global setting, and the final result output is generated by each classifier voting

  • Keyframe extraction first performed on the temporal sequence by skeletal data, relative position features are extracted from the skeletal data at keyframes to describe the spatiality of the action, followed by HSV color histogram from RGB data and LOP features from depth data to

Read more

Summary

Introduction

With the development and application of artificial intelligence, machine vision, human-computer interaction, and other technologies, computers are rapidly integrated into our lives and gradually change our lifestyles, in which human-computer interaction (HCI) technology, which allows close contact between humans and machines, continues to emerge as innovative and novel, whichHuman Intelligence Enhancement serves as a special language for communication between humans and computers, such as body movements, facial expressions, etc., to express specific interaction intentions and achieve the exchange of information between humans and computers (Abdullah et al, 2021). It is gradually derived to the operation of behavioral actions issued by humans autonomously, such as gestures, voice, body gestures, facial expressions, etc This way makes the human-computer interaction interface change from traditional to intelligent, from graphical user interface and network user interface to multimodal and multimedia intelligent interaction interface, from unilateral human-computer interaction to the stage of cognitive understanding of human, from satisfying basic functions to attaching importance to user experience, making the intelligent human-computer interaction interface more natural and diverse (Ludl et al, 2020). The realization process mainly includes two parts: feature extraction and classifier classification These problems cannot be solved by unimodal interaction in AR alone, and making the virtual-real interaction in virtual experiments more efficient and enhancing students’ feelings of the authenticity of experiments is the main goal at present. For domestic secondary school students’ virtual chemistry experiments, there is an urgent need for a new experimental system with intelligent interaction, which has the characteristics of the normal operation of the experimental process, and needs to meet the characteristics of experimental ease of operation, experimental ease of observation, and low cost, which is an intelligent experimental system (Jain et al, 2021)

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call