Abstract

Incremental learning is a methodology that continuously uses the sequential input data to extend the existing network’s knowledge. The layer sharing algorithm is one of the representative methods which leverages general knowledge by sharing some initial layers of the existing network. To determine the performance of the incremental network, it is critical to estimate how much the initial convolutional layers in the existing network can be shared as the fixed feature extractors. However, the existing algorithm selects the sharing configuration through improper optimization strategy but a brute force manner such as searching for all possible sharing layers case. This is a non-convex and non-differential problem. Accordingly, this can not be solved using powerful optimization techniques such as the gradient descent algorithm or other convex optimization problem, and it leads to high computational complexity. To solve this problem, we firstly define this as a discrete combinatorial optimization problem, and propose a novel efficient incremental learning algorithm-based Bayesian optimization, which guarantees the global convergence in a non-convex and non-differential optimization. Additionally, our proposed algorithm can adaptively find the optimal number of sharing layers via adjusting the threshold accuracy parameter in the proposed loss function. With the proposed method, the global optimal sharing layer can be found in only six or eight iterations without searching for all possible layer cases. Hence, the proposed method can find the global optimal sharing layers by utilizing Bayesian optimization, which achieves both high combined accuracy and low computational complexity.

Highlights

  • Computer vision technologies, including image recognition and object detection, have developed rapidly in the field of deep learning [1,2]

  • Three conditions are needed for the successful incremental learning algorithm: i The subsequent data from new tasks should be trainable and be accommodated incrementally without forgetting any knowledge in old tasks, i.e., it should not suffer from catastrophic forgetting [7]

  • The BayesOpt consists of two major components: (1) Gaussian process (GP) regression, which statistically defines the uncertainty of an objective function, (2) an acquisition function which selects where to sample [28,29]

Read more

Summary

Introduction

Computer vision technologies, including image recognition and object detection, have developed rapidly in the field of deep learning [1,2]. Sarwar et al [14] focused on hardware and energy requirements for model update [15,16,17], proposed a ‘clone-and-branch’ technique leveraging general knowledge from previous tasks to learn subsequent new tasks by sharing some initial convolutional layers of the base network as fixed extractors and fine-tuning in the new branch. This method leverages general knowledge from previously learned tasks to learn subsequent new tasks by sharing initial convolutional layers of base networks. In ‘clone and branch’ technique, there are two steps of training methodology, which consist of both empirical searching method and similarity scoring method

Step 1
Step 2
Problem Definition
Combined Classification Accuracy
Target Combined Classification Accuracy
Proposed Objective Function
Global Optimal Layer Selection via BayesOpt
Experiment Results
Implementation Details
Experimental Results on Resnet50 with CIFAR-100
Experimental Results on Case 1
Experimental Results on Case2
Experimental Results on MobileNetV2 with EMNIST
Comparison of Experimental Results for the ‘Clone and Branch’
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call