Abstract

The analysis of over-parameterized neural networks has drawn significant attention in recent years. It was shown that such systems behave like convex systems under various restricted settings, such as for two-layer neural networks, and when learning is only restricted locally in the so-called neural tangent kernel space around specialized initializations. However, there is a lack of powerful theoretical techniques that can analyze fully trained deep neural networks under general conditions. This paper considers this fundamental problem by investigating such overparameterized deep neural networks when fully trained. Specifically, we characterize a deep neural network by its features’ distributions and propose a metric to intuitively measure the usefulness of feature representations. Under certain regularizers that bounds the metric, we show deep neural networks can be reformulated as a <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">convex</i> optimization and the system can guarantee effective feature representations in terms of the metric. Our new analysis is more consistent with empirical observations that deep neural networks are capable of learning efficient feature representations. Empirical studies confirm that predictions of our theory are consistent with results observed in practice.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call