Abstract

The influence of deep learning is continuously expanding across different domains, and its new applications are ubiquitous. The question of neural network design thus increases in importance, as traditional empirical approaches are reaching their limits. Manual design of network architectures from scratch relies heavily on trial and error, while using existing pretrained models can introduce redundancies or vulnerabilities. Automated neural architecture design is able to overcome these problems, but the most successful algorithms operate on significantly constrained design spaces, assuming the target network to consist of identical repeating blocks. While such approach allows for faster search, it does so at the cost of expressivity. We instead propose an alternative probabilistic representation of a whole neural network structure under the assumption of independence between layer types. Our matrix of probabilities is equivalent to the population of models, but allows for discovery of structural irregularities, while being simple to interpret and analyze. We construct an architecture search algorithm, inspired by the estimation of distribution algorithms, to take advantage of this representation. The probability matrix is tuned towards generating high-performance models by repeatedly sampling the architectures and evaluating the corresponding networks, while gradually increasing the model depth. Our algorithm is shown to discover non-regular models which cannot be expressed via blocks, but are competitive both in accuracy and computational cost, while not utilizing complex dataflows or advanced training techniques, as well as remaining conceptually simple and highly extensible.

Highlights

  • The recent successes of deep learning have attracted significant interest in numerous fields of knowledge [1]

  • Computer vision in particular has witnessed the development of multiple successful models, based on convolutional neural networks (CNNs), for tasks such as classification [2], [3], semantic segmentation [4], and detection [5]

  • We propose a CNN architecture search method based on the optimization of the above prototype, denoted Architecture Search by Estimation of network structure Distributions (ASED)

Read more

Summary

Introduction

The recent successes of deep learning have attracted significant interest in numerous fields of knowledge [1]. While the growth of deep learning solutions over the years is impressive, their adoption brings many significant challenges of both theoretical and practical nature. In addition to well-known problems such as overfitting or vanishing gradients, which have been subjects of extensive research over the years, new issues are being discovered, which are not yet fully understood. The lack of interpretability of decisions made by deep models [6], [7] is a difficult problem to tackle, but has attracted increasing research attention recently [8]. Further concerns have been raised regarding secure practical use of common deep models, as they were shown to be vulnerable to attacks utilizing malicious data [9]

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.