In recent years, with the rapid development of computer technology and network technology, computer vision has been widely used in various scientific fields. Human motion recognition, as an important branch of computer vision, is essentially to classify human motion information in motion images correctly. It has great significance in intelligent monitoring and security, human-computer interaction, motion analysis and other fields. At present, there are still some problems in human motion recognition methods. Firstly, how to extract and characterize the motion information in images has been one of the difficulties in this field; secondly, with the appearance of kinect and other depth cameras, researchers have provided the depth information of human motion images, and how to effectively use these depth information to achieve human motion recognition and classification is also an important research issue; finally, when the amount of sample data is small, how to use the deep learning network model to achieve a higher human motion recognition rate? Based on UTD-MHAD database, this paper studies the human motion recognition of RGB image and depth image captured simultaneously by kinect, and carries out relevant discussion and analysis on the above problems, using micro-inertial sensors (MTi-G-700 developed by Xsens and Android mobile phones, tablets and other personal mobile devices come with MEMS gyroscopes and accelerometers) to correct the image to motion blur, build a new mathematical model, use the inertial data obtained by MIMU in a short time to estimate the position, attitude and speed of camera motion, correct the image pixel position, perform image de-motion blur processing, and then perform image processing such as denoising to solve the image motion blur problem. A new algorithm is developed and its science is verified by MATLAB simulation.