Towards Accurate and Compact Architectures via Neural Architecture Transformer.

Yong Guo,Qi Chen,Zhipeng Li,Jian Chen,Junzhou Huang,Yin Zheng,Mingkui Tan,Peilin Zhao

doi:10.1109/tpami.2021.3086914

Abstract

Designing effective architectures is one of the key factors behind the success of deep neural networks. Existing deep architectures are either manually designed or automatically searched by some Neural Architecture Search (NAS) methods. However, even a well-designed/searched architecture may still contain many nonsignificant or redundant modules/operations (e.g., some intermediate convolution or pooling layers). Such redundancy may not only incur substantial memory consumption and computational cost but also deteriorate the performance. Thus, it is necessary to optimize the operations inside an architecture to improve the performance without introducing extra computational cost. To this end, we have proposed a Neural Architecture Transformer (NAT) method which casts the optimization problem into a Markov Decision Process (MDP) and seeks to replace the redundant operations with more efficient operations, such as skip or null connection. Note that NAT only considers a small number of possible replacements/transitions and thus comes with a limited search space. As a result, such a small search space may hamper the performance of architecture optimization. To address this issue, we propose a Neural Architecture Transformer++ (NAT++) method which further enlarges the set of candidate transitions to improve the performance of architecture optimization. Specifically, we present a two-level transition rule to obtain valid transitions, i.e., allowing operations to have more efficient types (e.g., convolution → separable convolution) or smaller kernel sizes (e.g., 5×5 → 3×3). Note that different operations may have different valid transitions. We further propose a Binary-Masked Softmax (BMSoftmax) layer to omit the possible invalid transitions. Last, based on the MDP formulation, we apply policy gradient to learn an optimal policy, which will be used to infer the optimized architectures. Extensive experiments show that the transformed architectures significantly outperform both their original counterparts and the architectures optimized by existing methods.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Towards Accurate and Compact Architectures via Neural Architecture Transformer.

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Pattern Analysis and Machine Intelligence

Lead the way for us

Journal: IEEE Transactions on Pattern Analysis and Machine Intelligence	Publication Date: Oct 1, 2022
Citations: 20

Similar Papers

Enhanced Gradient for Differentiable Architecture Search.
Haichao Zhang ... Lei Gao
IEEE transactions on neural networks and learning systems | VOL. 35
Haichao Zhang, et. al.Haichao Zhang ... Lei Gao
01 Jul 2024
IEEE transactions on neural networks and learning systems | VOL. 35

TF-MOPNAS: Training-free Multi-objective Pruning-Based Neural Architecture Search
Quan Minh Phan ... Ngoc Hoang Luong
-
Quan Minh Phan, et. al.Quan Minh Phan ... Ngoc Hoang Luong
01 Jan 2021
01 Jan 2021

Efficient and lightweight convolutional neural network architecture search methods for object classification
Chuen-Horng Lin ... Yung-Kuan Chan
Pattern Recognition | VOL. 156
Chuen-Horng Lin, et. al.Chuen-Horng Lin ... Yung-Kuan Chan
06 Jul 2024
Pattern Recognition | VOL. 156

A multi-perspective revisit to the optimization methods of Neural Architecture Search and Hyper-parameter optimization for non-federated and federated learning environments
Salabat Khan ... Do Hyuen Kim
Computers and Electrical Engineering | VOL. 110
Salabat Khan, et. al.Salabat Khan ... Do Hyuen Kim
20 Jul 2023
Computers and Electrical Engineering | VOL. 110

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Towards Accurate and Compact Architectures via Neural Architecture Transformer.

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Pattern Analysis and Machine Intelligence