Inferring the intended actions of road-sharing users with autonomous ground vehicles in particularly vulnerable ones like cyclists is considered one of the tough tasks facing the wide-spread deployment of autonomous ground vehicles. One of the main reasons for that is the scarcity of the available datasets for that task due to the difficulty in obtaining those datasets in real environments. In this work, we first propose a pipeline that can synthetically produce 3D LiDAR data of cyclists hand-signalling a set of intended actions that are commonly done in real environments. Given the synthetically-produced labelled 3D LiDAR data sequences, we trained a framework that can simultaneously detect, track and give predictions about the intended actions of multi-cyclists in the scene on time. The proposed framework was evaluated using both synthetic and real data from a physical 3D LiDAR sensor. Our proposed framework has scored competitive and robust results in both synthetic and real environments with 88% in F1 measure with higher frame per second rate (12.9 FPS) than the 3D LiDAR sensor frame rate (10 Hz).
Read full abstract