Abstract

Deep learning is currently one of the most effective approaches in machine learning with applications in image processing, computer vision, and natural language processing. The key technique underpinning its success is the automated learning of latent representation in data using neural networks that employ parametric hidden variables. However, these parameters are typically subject to a non-convex optimization, making the global optimum hard to find. Inductive learning frameworks that guarantee global optimality have been recently developed for two-layer conditional models with a learning strategy based on parametric transfer functions. However, they require optimization over large kernel matrices, hence are slow in training and cannot be scaled to big datasets. In this thesis, we propose a novel optimization strategy that iteratively and greedily expands the subspace of kernels, interlaced with network parameter optimization in the low-rank subspace. The resulting approach significantly speeds up training, while maintaining optimality and accuracy. This allows convex neural networks to be scaled to 10,000 examples for the first time.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call