Abstract

For an automated driving system to be robust, it needs to recognize not only fixed signals such as traffic signs and traffic lights, but also gestures used by traffic police. With the aim to achieve this requirement, this paper proposes a new gesture recognition technology based on a graph convolutional network (GCN) according to an analysis of the characteristics of gestures used by Chinese traffic police. To begin, we used a spatial–temporal graph convolutional network (ST-GCN) as a base network while introducing the attention mechanism, which enhanced the effective features of gestures used by traffic police and balanced the information distribution of skeleton joints in the spatial dimension. Next, to solve the problem of the former graph structure only representing the physical structure of the human body, which cannot capture the potential effective features, this paper proposes an adaptive graph structure (AGS) model to explore the hidden feature between traffic police gesture nodes and a temporal attention mechanism (TAS) to extract features in the temporal dimension. In this paper, we established a traffic police gesture dataset, which contained 20,480 videos in total, and an ablation study was carried out to verify the effectiveness of the method we proposed. The experiment results show that the proposed method improves the accuracy of traffic police gesture recognition to a certain degree; the top-1 is 87.72%, and the top-3 is 95.26%. In addition, to validate the method’s generalization ability, we also carried out an experiment on the Kinetics–Skeleton dataset in this paper; the results show that the proposed method is better than some of the existing action-recognition algorithms.

Highlights

  • In just over a decade, automated driving technology [1,2] has achieved an impressive breakthrough in theoretical research and technology application, and automated driving vehicles are regarded as a research hotspot by universities and research institutions worldwide

  • We used skeleton data of traffic police to construct graph structure data as an input source of the model; using the ST-graph convolutional network (GCN) as a base network, the spatial– temporal features of gestures used by traffic police were extracted, and gestures used by traffic police were recognized

  • As the spatial–temporal graph convolutional network (ST-GCN) has a disadvantage in that it only learns the physical structure of the human body but ignores the potential relationship between joints in frames, we proposed the adaptive graph structure (AGS) to learn the potential relationship between nodes of traffic police skeletons

Read more

Summary

Introduction

In just over a decade, automated driving technology [1,2] has achieved an impressive breakthrough in theoretical research and technology application, and automated driving vehicles are regarded as a research hotspot by universities and research institutions worldwide. At present, unmanned vehicles with automated driving technology can achieve good autodriving functions in simple, closed-road environments. The existing technology does not have the ability to understand complex and uncertain road scenes as human drivers can; for example, the scenes with bad weather, such as heavy snow or fog; irregular road situations, such as ponding water and narrow paths; and special road scenes, such as emergencies and multivehicle confluence. The core problem is that human-like understanding and interaction cognition in complex environments are difficult to achieve, which seriously influences the safety and reliability of vehicles. If the core problem we mentioned above cannot be solved, it will be very difficult for automated driving vehicles to reach level L4 and above; these levels were set by J3016 [3]

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.