Abstract

Solid developments have been seen in deep-learning-based pose estimation, but few works have explored performance in dense crowds, such as a classroom scene; furthermore, no specific knowledge is considered in the design of image augmentation for pose estimation. A masked autoencoder was shown to have a non-negligible capability in image reconstruction, where the masking mechanism that randomly drops patches forces the model to build unknown pixels from known pixels. Inspired by this self-supervised learning method, where the restoration of the feature loss induced by the mask is consistent with tackling the occlusion problem in classroom scenarios, we discovered that the transfer performance of the pre-trained weights could be used as a model-based augmentation to overcome the intractable occlusion in classroom pose estimation. In this study, we proposed a top-down pose estimation method that utilized the natural reconstruction capability of missing information of the MAE as an effective occluded image augmentation in a pose estimation task. The difference with the original MAE was that instead of using a 75% random mask ratio, we regarded the keypoint distribution probabilistic heatmap as a reference for masking, which we named Pose Mask. To test the performance of our method in heavily occluded classroom scenes, we collected a new dataset for pose estimation in classroom scenes named Class Pose and conducted many experiments, the results of which showed promising performance.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call