With the rapid development of computer technology, the information revolution is constantly affecting and changing our lives. Today's virtual reality technology is no longer just a virtual model for viewing. Experiencers are more willing to personally interact with the virtual environment in a virtual scene to increase their sense of gaming experience.Today's virtual reality interaction methods mainly interact through the handles that come with VR helmets. Previously, some scholars have researched gesture interaction in virtual environments. They use data gloves to obtain information about hand movements. This method makes users have to wear heavy data gloves, which limits the natural movement of their hands. To overcome this shortcoming, the need for low‐cost vision‐based virtual reality gesture interaction is becoming more and more urgent.Traditional gesture recognition technologies based on artificial modeling include template‐based matching, data‐based gloves, and machine learning. As machine learning gesture classifiers, Hidden Markov Model (HMM) and Support Vector Machine (SVM) [1] have been widely used. However, in the hand motion detection of ordinary cameras, the image is prone to speckles, lighting changes, and other factors, which seriously affects the performance of the classifier, making it difficult to accurately and efficiently detect the first image when the background is complex and the lighting is affected. In recent years, deep learning‐based object detection methods have achieved great breakthroughs [2]. In the past decade, many deep learning‐based dynamic gesture recognition algorithms have been introduced. In the case of changing lighting conditions, in order to improve the classification accuracy of gesture recognition [3]. Therefore, this article chooses a depth camera combined with HTC VIVE device for gesture recognition in virtual reality. First obtain the depth image of the hand through the depth camera, then label the depth image, and then learn through the deep learning network hand movement training [4], preprocess the detected human hand image, and recognize it by hand detection The position of the hand in the image is obtained, and the movement of the hand is recognized as a result of recognition.The algorithm in this paper uses the advanced deep learning network YOLOv3, and changes the network structure and multiscale structure based on the original network. This algorithm is compared with other deep learning algorithms for gesture recognition, and the recognition effect is better than other algorithms in the same category.