Abstract

Computer vision-based action recognition of basketball players in basketball training and competition has gradually become a research hotspot. However, owing to the complex technical action, diverse background, and limb occlusion, it remains a challenging task without effective solutions or public dataset benchmarks. In this study, we defined 32 kinds of atomic actions covering most of the complex actions for basketball players and built the dataset NPU RGB+D (a large scale dataset of basketball action recognition with RGB image data and Depth data captured in Northwestern Polytechnical University) for 12 kinds of actions of 10 professional basketball players with 2169 RGB+D videos and 75 thousand frames, including RGB frame sequences, depth maps, and skeleton coordinates. Through extracting the spatial features of the distances and angles between the joint points of basketball players, we created a new feature-enhanced skeleton-based method called LSTM-DGCN for basketball player action recognition based on the deep graph convolutional network (DGCN) and long short-term memory (LSTM) methods. Many advanced action recognition methods were evaluated on our dataset and compared with our proposed method. The experimental results show that the NPU RGB+D dataset is very competitive with the current action recognition algorithms and that our LSTM-DGCN outperforms the state-of-the-art action recognition methods in various evaluation criteria on our dataset. Our action classifications and this NPU RGB+D dataset are valuable for basketball player action recognition techniques. The feature-enhanced LSTM-DGCN has a more accurate action recognition effect, which improves the motion expression ability of the skeleton data.

Highlights

  • With the development of computer vision and deep learning technology, the applications of depth cameras in human–computer interactions, automatic driving, virtual reality, and so on have increased in number

  • Under the evaluation criteria of cross objects and cross perspectives, our method achieved accuracy rates of 87.0% and 94.6% on the NTU RGB image data and Depth data (RGB+D) dataset, which are better than most mainstream action recognition methods and similar to the accuracy rates of 88.5% and 95.1% by the most advanced 2s-AGCN

  • We proposed a feature-enhanced method long short-term memory (LSTM)-deep graph convolutional network (DGCN) of action recognition based on the spatio–temporal relationship of skeleton data

Read more

Summary

Introduction

With the development of computer vision and deep learning technology, the applications of depth cameras in human–computer interactions, automatic driving, virtual reality, and so on have increased in number. One of the most popular application scenarios is the use of 3D vision for human action recognition and behavior analysis [1]. Human action recognition datasets in RGB image data and Depth data (RGB+D) format have emerged [2]. Only a few large-scale open RGB+D datasets are dedicated to basketball to support research on basketball player action recognition. We cannot build a benchmark that can evaluate and compare basketball player action recognition methods. Recent data-driven technology, such as deep learning methods, has a high demand for data. Almost all related datasets have limitations in the following aspects

Methods
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call