Abstract

Driver distraction detection is an essential component in human-centric driving systems of intelligent vehicles despite increasing vehicle automation. Existing driver distraction detection usually achieves superior accuracy with deep models. However, to deploy on an embedded system with limited computing resources, these models must undergo compression, which sacrifices their performance. To bridge this gap, we propose to train an accurate, fast, and lightweight model through multiple teacher knowledge distillations. Current approaches of multi-teacher knowledge distillation usually randomly select a teacher model and apply the prediction of the teacher model as the soft label or average all the teachers’ predictions as the soft label. These approaches are not ideal since each teacher network’s knowledge is different, and not all teacher knowledge that is randomly selected is correct. Specifically, some of the teacher network’s prediction has a high probability of incorrect classes. In this paper, we propose to grade the teacher models’ prediction against the ground truth in every instance and assign different weights to the teacher based on the grade. To this end, we design a simple grading module to grade each prediction made by the teacher models. The predictions made by each teacher module are dynamically assigned a weight through the grading module to obtain the final soft label to train the student model. We perform extensive validation on AUCD3 and SFD2 datasets and verify the effectiveness of our proposed model. The distilled student models obtain 1%–6% improvement in accuracy while retaining their original model size. Implementation on Jetson Nano 2GB edge device affirms our distilled model can achieve fast and accurate predictions yet small in size.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call