AbstractRecently, the Visual Image Transformer (ViT) has revolutionized various domains in computer vision. The transfer of pre‐trained ViT models on large‐scale datasets has proven to be a promising method for downstream tasks. However, traditional transfer methods introduce numerous additional parameters in transformer blocks, posing new challenges in learning downstream tasks. This article proposes an efficient transfer method from the perspective of neural Ordinary Differential Equations (ODEs) to address this issue. On the one hand, the residual connections in the transformer layers can be interpreted as the numerical integration of differential equations. Therefore, the transformer block can be described as two explicit Euler method equations. By dynamically learning the step size in the explicit Euler equation, a highly lightweight method for transferring the transformer block is obtained. On the other hand, a new learnable neural memory ODE block is proposed by taking inspiration from the self‐inhibition mechanism in neural systems. It increases the diversity of dynamical behaviours of the neurons to transfer the head block efficiently and enhances non‐linearity simultaneously. Experimental results in image classification demonstrate that the proposed approach can effectively transfer ViT models and outperform state‐of‐the‐art methods.
Read full abstract