The Intelligent Table Tennis Serving Machine (TTSM) has revolutionized table tennis training by simulating the variability and precision of human opponents. However, current systems lack the capability to track and analyze ball landing points, limiting comprehensive performance evaluation. To address this, we propose a novel approach utilizing YOLOv8 as the benchmark model for detecting table tennis balls launched by a TTSM, tailored to meet the demands of intelligent training systems. Our key innovation involves integrating a Res2Net architecture enhanced with a Non-local Attention Module (NLAM) and a Dilated Atrous Spatial Pyramid Pooling (DASPP) unit into the neck network. This design improves the network’s ability to integrate multi-scale local features, thereby enhancing global information processing. The DASPP module, with adaptive dilation convolution, ensures thorough context comprehension. Additionally, we incorporate an Omni-dimensional Dynamic Convolution (ODConv) module in the detection head, employing parallel strategies to learn diverse and complementary attention cues, further enhancing detection accuracy and robustness. To validate our method, we conducted evaluations using the publicly available OpenTTGames dataset and a self-constructed dataset comprising over 10,000 table tennis ball images captured in various environments. Experimental results indicate that our method enhances mAP0.5, Precision, and Recall by 2.3%, 5.1%, and 0.9% on the OpenTTGames dataset, and by 3%, 2%, and 1.6% on the self-built dataset, respectively. Finally, we describe the optimization process of the intelligent TTSM system, which leverages keyframe analysis of ball landing points to provide feedback, thereby enabling continuous performance improvement in subsequent phases.
Read full abstract