Interplay between depth and width for interpolation in neural ODEs

Antonio Álvarez-López,Arselane Hadj Slimane,Enrique Zuazua

doi:10.1016/j.neunet.2024.106640

Abstract

Neural ordinary differential equations have emerged as a natural tool for supervised learning from a control perspective, yet a complete understanding of the role played by their architecture remains elusive. In this work, we examine the interplay between the width p and the number of transitions between layers L (corresponding to a depth of L+1). Specifically, we construct explicit controls interpolating either a finite dataset D, comprising N pairs of points in Rd, or two probability measures within a Wasserstein error margin ɛ>0. Our findings reveal a balancing trade-off between p and L, with L scaling as 1+O(N/p) for data interpolation, and as 1+Op−1+(1+p)−1ɛ−d for measures.In the high-dimensional and wide setting where d,p>N, our result can be refined to achieve L=0. This naturally raises the problem of data interpolation in the autonomous regime, characterized by L=0. We adopt two alternative approaches: either controlling in a probabilistic sense, or by relaxing the target condition. In the first case, when p=N we develop an inductive control strategy based on a separability assumption whose probability increases with d. In the second one, we establish an explicit error decay rate with respect to p which results from applying a universal approximation theorem to a custom-built Lipschitz vector field interpolating D.

Full Text