Abstract

3D point cloud contains plentiful topological geometric features, which are crucially important for the task of driver head pose estimation. However, most recent works only take RGB or RGB-D data as input, without utilizing the topological features in 3D point cloud. In this work, we propose a Multi-Stream Graph Convolution Network (MS-GCN) to incorporate topological, local, and global facial information for driver head pose estimation. The introduced MS-GCN contains three Graph Convolution Network (GCN) streams, designed to finely extract topological features, local features and global features respectively. In particular, RepVggis introduced to extract the appearance feature, followed by one stream of GCN to capture the global facial information. Another stream of GCN is used to capture the local appearance feature related to local facial landmarks. Besides, the third stream is introduced to process the topological geometric feature from 3D facial landmarks. Note that the sparse 3D facial landmarks are located from dense point cloud as the input of the third topological GCN in this work. The three streams are finally fused based on the attention mechanism. To the best of our knowledge, this is the first attempt to construct 3D graph on the sparse 3D facial landmarks point cloud to efficiently extract topological information for head pose estimation. Extensive experiments are conducted on 300W-LP, AFLW2000 and BIWI datasets, and the results show that our method achieves state-of-the-art results among landmark-based methods, and is also competitive with landmark-free methods.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call