Abstract The integration of mixed reality simulation training in institutional education and teaching processes can yield significant quality benefits. Non-wearable interactive methods represent a more natural form of hand interaction within flight cockpits, covering various emerging fields such as mixed reality, computer vision, and human-computer interaction. This paper is guided by a “human-centered” approach to natural human-computer interaction and proposes a virtual hand mixed reality interaction scheme based on hand natural feature points. This study introduces a hand detection, segmentation, and recognition algorithm based on skin colour and key hand feature points. Initially, the algorithm uses the Otsu adaptive threshold algorithm in the YCbCr space for skin colour detection to accommodate varying image brightness levels. Subsequently, it employs a hand Keypoint model for rapid hand detection within skin-like areas while excluding interference from other regions. During the process of hand tracking, optimization prediction is performed using the particle filter algorithm combined with the artificial fish swarm algorithm to achieve tracking accuracy within six pixels in both horizontal and vertical directions. Gesture recognition involves extracting Fourier descriptive contour features of the hand bone structure through identification of feature points. Furthermore, this study combines the optimization capabilities of the artificial fish swarm algorithm to enhance support vector machine model parameter optimization for improved recognition rates. Testing involved 800 processed images as part of a test set for gesture recognition; only five commonly used gestures within flight cockpits were recognized and classified with an accuracy rate reaching up to 98%. Finally, this research presents virtual representations of common operational gestures including natural hand states, control stick manipulation, and throttle handling positions, button activations, and rotational movements via head-mounted displays or screens. The proposed mixed reality-based hand gesture interaction system addresses challenges related to complex background interference during hand detection in cockpit environments as well as issues arising from lighting disturbances. Additionally it resolves problems associated with insufficient particles leading to tracking failures while significantly enhancing registration effects during tracking processes. Moreover it mitigates issues related to Fourier descriptive features being influenced by background variations or changes in pose along with limitations pertaining to expressing singular forms of gestures thereby improving overall classification accuracy.