Standardized striking movements are essential in badminton for enhancing player techniques and minimizing sports-related injuries. However, accurately detecting these movements against complex backgrounds while balancing precision and speed remains a significant challenge. To address this, we propose a novel model that synergizes You Only Look Once (YOLO) with the Hourglass Network (HGNet), called YOLO-HGNet, to enhance feature learning across multiple levels. By replacing traditional convolutional modules with Depth-Wise Convolution (DWConv), we achieve significant improvements in data processing efficiency. Additionally, our model incorporates a combination of self-attention and convolution mechanisms (ACmix) and FocalModulation to improve object localization and recognition accuracy in complex backgrounds. Our method leverages action vectors and machine learning techniques to accurately detect and classify six key badminton strokes: backhand push, backhand net shot, forehand clear, forehand push, forehand lift, and forehand net shot. Empirical evaluations demonstrate that our approach achieves a mean Average Precision (mAP) of 96.1% for detecting badminton player postures, outperforming existing advanced methods by at least 8.8%. Furthermore, our method achieves an average accuracy of 95.4% in classifying the six badminton strokes. These results underscore the superior capability of YOLO-HGNet for precise and efficient pose detection, recognition, and classification of badminton strokes, contributing significantly to advancements in sports science and athlete training methodologies.