The technical essentials of basketball are very complex, in order to accomplish the identification of fine-grained basketball movements and improve the quality of basketball training, the study innovatively proposes a region screening strategy to obtain the local region information of image frames, and draws on the SlowFast network structure to design the static and dynamic detail models, and finally utilizes the 3D attention feature fusion module to complete the multi-feature relationship processing and recognition. Performance testing experiments have confirmed that the dual stream model based on region filtering has achieved a maximum improvement in average accuracy of 16.35 % and 60.36 % in the recognition of RGB images, preserving relatively complete local effective region information of RGB images. The model based on channel domain attention mechanism improves its average accuracy to over 60 %. The model recognition accuracy calculation time after dual stream fusion is reduced by about 50 %. The model based on 3D attention feature fusion module exhibits stronger robustness when the feature dimension changes. And the loss value of this model rapidly decreases and converges to the minimum loss value, with its accuracy and recall reaching 90.03 % and 88.47 % respectively, and its AUC value reaching a maximum of 0.761. The design of the study contributes to the digital development of teaching and training in basketball and enriches the theoretical body of knowledge on deep learning and action recognition.