VF-Net: Robustness Via Understanding Distortions and Transformations
Ensuring the secure and dependable deployment of deep neural networks hinges on their ability to withstand distributional shifts and distortions. While data augmentation enhances robustness, its effectiveness varies across different types of data corruption. It tends to excel in cases where corruptions share perceptually similar traits or have a high-frequency nature. In response, a strategy is to encompass a broad spectrum of distortions. Yet, it is often impractical to incorporate every conceivable modification that images may undergo within augmented data. Instead, we show that providing the model with a stronger inductive bias to learn the underlying concept of “change” would offer a more reliable approach. To this end, we develop Virtual Fusion (VF), a technique that treats corruptions as virtual labels. Diverging from conventional augmentation, when an image undergoes any form of transformation, its label becomes linked with the specific name attributed to the distortion. The finding indicates that VF effectively enhances both clean accuracy and robustness against common corruptions. On previously unseen corruptions, it shows an $11.90 \%$ performance improvement and a $12.78 \%$ increase in accuracy. In similar corruption scenarios, it achieves a $7.83 \%$ performance gain and a significant accuracy improvement of $22.04 \%$ on robustness benchmarks.