Abstract

Training a one-node neural network with the ReLU activation function via optimization, which we refer to as the ON-ReLU problem, is a fundamental problem in machine learning. In this paper, we begin by proving the NP-hardness of the ON-ReLU problem. We then present an approximation algorithm to solve the ON-ReLU problem, whose running time is $\mathcal {O}(n^k)$ where $n$ is the number of samples, and $k$ is a predefined integral constant as an algorithm parameter. We analyze the performance of this algorithm under two regimes and show that: (1) given any arbitrary set of training samples, the algorithm guarantees an $(n/k)$ -approximation for the ON-ReLU problem – to the best of our knowledge, this is the first time that an algorithm guarantees an approximation ratio for arbitrary data scenario; thus, in the ideal case (i.e., when the training error is zero) the approximation algorithm achieves the globally optimal solution for the ON-ReLU problem; and (2) given training sample with Gaussian noise, the same approximation algorithm achieves a much better asymptotic approximation ratio which is independent of the number of samples $n$ . Extensive numerical studies show that our approximation algorithm can perform better than the gradient descent algorithm. Our numerical results also show that the solution of the approximation algorithm can provide a good initialization for gradient descent, which can significantly improve the performance.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call