Abstract

Neural Ordinary Differential Equations (Neural ODEs), as a family of novel deep models, delicately link conventional neural networks and dynamical systems, which bridges the gap between theory and practice. However, they have not made substantial progress on activation functions, and ReLU is always utilized by default. Moreover, the dynamical behavior existing in them becomes more unclear and complicated as training progresses. Fortunately, existing studies have shown that activation functions are essential for Neural ODEs in governing intrinsic dynamics. Motivated by a family of weight functions used to enhance the stability of dynamical systems, we introduce a new activation function named half-Swish to match Neural ODEs. Besides, we explore the effect of evolution time and batch size on Neural ODEs, respectively. Experiments show that our model consistently outperforms Neural ODEs with basic activation functions on robustness both against stochastic noise images and adversarial examples across Fashion-MNIST, CIFAR-10, and CIFAR-100 datasets, which strongly validates the applicability of half-Swish and suggests that half-Swish function plays a positive role in regularizing the dynamic behavior to enhance stability. Meanwhile, our work theoretically provides a prospective framework to choose appropriate activation functions to match neural differential equations.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call