Understanding traffic officer commands is a fundamental perception task for intelligent vehicles in driver assistance and autonomous driving. Previous studies have emphasized explicit traffic command gesture recognition but have not considered situations where the traffic officer is controlling the subjects in other directions, which would also influence decision-making of the ego vehicle. To fill in the gap, this article aims to research visual skeleton-based recognition of traffic commands occurring at road intersections, where both command directions and gestures should be determined. Specifically, a two-stage recognition framework for four cross-shaped directions and eight command gestures is proposed. Two kinds of handcrafted features, including upper-body geometric features and keypoint co-occurrence features, are established with estimated 2D human keypoint coordinates and heatmaps and further combined into a deep learning network. The first stage handles human body orientation classification, while the second stage addresses command gesture recognition with extra usage of the output from the first stage. Combining the recognized body orientation and command gesture, the type of traffic command can ultimately be inferred. For training and validation, a dataset termed the Chinese Traffic Command at Intersections (CTCX) is built. The proposed method gains an outperforming edit accuracy of 89.67% on the CTCX test set, demonstrating its effectiveness. This work provides a foundation in this area and is expected to inspire more research on traffic command recognition with directions in the near future.
Read full abstract