Adversarial Detection Transformer For Kuzushiji Recognition
Kuzushiji recognition is an optical character recognition task that aims to recognize ancient Japanese characters. Kuzushiji has over 4000 categories and some characters are very similar, which poses a challenge for existing recognition methods. Moreover, Kuzushiji characters are often small and connected, making it difficult to locate character positions. To overcome these problems, we propose Adversarial Detection Transformer (Adversarial-DETR), a method that learns to fight against noisy boxes and class labels to reconstruct ground truth. In this paper, we assume that direct model predictions are noise predictions and propose Real Denoising (RDN) to leverage these prediction noises. We also introduce a target-aware focal loss (TFL) to accelerate the convergence speed. Moreover, we propose a task-driven encoder-decoder (TED) structure based on the observation that different scale features excel at different tasks. Through experiments on the Kuzushiji dataset, Adversarial-DETR achieves the best performance of 0.941 F 1 score, outperforming state-of-the-art DETRs and other detection methods.