Abstract
Incremental learning is a methodology that continuously uses the sequential input data to extend the existing network’s knowledge. The layer sharing algorithm is one of the representative methods which leverages general knowledge by sharing some initial layers of the existing network. To determine the performance of the incremental network, it is critical to estimate how much the initial convolutional layers in the existing network can be shared as the fixed feature extractors. However, the existing algorithm selects the sharing configuration through improper optimization strategy but a brute force manner such as searching for all possible sharing layers case. This is a non-convex and non-differential problem. Accordingly, this can not be solved using powerful optimization techniques such as the gradient descent algorithm or other convex optimization problem, and it leads to high computational complexity. To solve this problem, we firstly define this as a discrete combinatorial optimization problem, and propose a novel efficient incremental learning algorithm-based Bayesian optimization, which guarantees the global convergence in a non-convex and non-differential optimization. Additionally, our proposed algorithm can adaptively find the optimal number of sharing layers via adjusting the threshold accuracy parameter in the proposed loss function. With the proposed method, the global optimal sharing layer can be found in only six or eight iterations without searching for all possible layer cases. Hence, the proposed method can find the global optimal sharing layers by utilizing Bayesian optimization, which achieves both high combined accuracy and low computational complexity.
Highlights
Computer vision technologies, including image recognition and object detection, have developed rapidly in the field of deep learning [1,2]
Three conditions are needed for the successful incremental learning algorithm: i The subsequent data from new tasks should be trainable and be accommodated incrementally without forgetting any knowledge in old tasks, i.e., it should not suffer from catastrophic forgetting [7]
The BayesOpt consists of two major components: (1) Gaussian process (GP) regression, which statistically defines the uncertainty of an objective function, (2) an acquisition function which selects where to sample [28,29]
Summary
Computer vision technologies, including image recognition and object detection, have developed rapidly in the field of deep learning [1,2]. Sarwar et al [14] focused on hardware and energy requirements for model update [15,16,17], proposed a ‘clone-and-branch’ technique leveraging general knowledge from previous tasks to learn subsequent new tasks by sharing some initial convolutional layers of the base network as fixed extractors and fine-tuning in the new branch. This method leverages general knowledge from previously learned tasks to learn subsequent new tasks by sharing initial convolutional layers of base networks. In ‘clone and branch’ technique, there are two steps of training methodology, which consist of both empirical searching method and similarity scoring method
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have