Oracle bone characters (OBCs) are ancient ideographs for divination and memorization, as well as first-hand evidence of ancient Chinese culture. The detection of OBC is the premise of advanced studies and was mainly done by authoritative experts in the past. Deep learning techniques have great potential to facilitate OBC detection, but the high annotation cost of OBC brings the scarcity of labeled data, hindering its application. This paper proposes a novel OBC detection framework called OBCTeacher based on semi-supervised learning (SSL) to resist labeled data scarcity. We first construct a large-scale OBC detection dataset. Through investigation, we find that spatial mismatching and class imbalance problems lead to decreased positive anchors and biased predictions, affecting the quality of pseudo labels and the performance of OBC detection. To mitigate the spatial mismatching problem, we introduce a geometric-priori-based anchor assignment strategy and a heatmap polishing procedure to increase positive anchors and improve the quality of pseudo labels. As for the class imbalance problem, we propose a re-weighting method based on estimated class information and a contrastive anchor loss to achieve prioritized learning on different OBC classes and better class boundaries. We evaluate our method by using only a small portion of labeled data while using the remaining data as unlabeled and all labeled data with extra unlabeled data. The results demonstrate the effectiveness of our method compared with other state-of-the-art methods by superior performance and significant improvements of an average of 11.97 in AP50:95 against the only supervised baseline. In addition, our method achieves comparable performance using only 20% of labeled data to the fully-supervised baseline using 100% of labeled data, demonstrating that our method significantly reduces the dependence on labeled data for OBC detection.
Read full abstract