Abstract

Deep neural networks have achieved high performance in image classification, image generation, voice recognition, natural language processing, etc.; however, they still have confronted several open challenges that need to be solved such as incremental learning problem, overfitting in neural networks, hyperparameter optimization, lack of flexibility and multitasking, etc. In this paper, we focus on the incremental learning problem which is related with machine learning methodologies that continuously train an existing model with additional knowledge. To the best of our knowledge, a simple and direct solution to solve this challenge is to retrain the entire neural network after adding the new labels in the output layer. Besides that, transfer learning can be applied only if the domain of the new labels is related to the domain of the labels that have already been trained in the neural network. In this paper, we propose a novel network architecture, namely Brick Assembly Network (BAN), which allows a trained network to assemble (or dismantle) a new label to (or from) a trained neural network without retraining the entire network. In BAN, we train labels with a sub-network (i.e., a simple neural network) individually and then we assemble the converged sub-networks that have trained for a single label together to form a full neural network. For each label to be trained in a sub-network of BAN, we introduce a new loss function that minimizes the loss of the network with only one class data. Applying one loss function for each class label is unique and different from standard neural network architectures (e.g., AlexNet, ResNet, InceptionV3, etc.) which use the values of a loss function from multiple labels to minimize the error of the network. The difference of between the loss functions of previous approaches and the one we have introduced is that we compute a loss values from node values of penultimate layer (we named it as a characteristic layer) instead of the output layer where the computation of the loss values occurs between true labels and predicted labels. From the experiment results on several benchmark datasets, we evaluate that BAN shows a strong capability of adding (and removing) a new label to a trained network compared with a standard neural network and other previous work.

Highlights

  • Deep neural networks [1] have played an important role in many areas of artificial intelligence field, such as image classification and object detection [2,3,4,5], image generation [6,7,8,9], speech recognition [10,11,12], text generation [13,14], etc

  • The incremental learning problem is worth exploration because most neural network systems have a poor capability in adding new labels to their output layer after the neural network systems have been converged

  • Roy et al [17] have proposed a hierarchical deep convolutional neural network (TreeCNN) for solving the incremental learning problem by growing a trained network structure if new labels are added to the network

Read more

Summary

Introduction

Deep neural networks [1] have played an important role in many areas of artificial intelligence field, such as image classification and object detection [2,3,4,5], image generation [6,7,8,9], speech recognition [10,11,12], text generation [13,14], etc. In order to apply a transfer learning on the image classification problem, we remain the convolutional layers of the neural network and retrain only the fully connected layers of the neural network This solution is more effective than the first solution, it has a restriction that the new label must be from a similar domain of the other labels that have already been trained in the neural network. To address these problems (i.e., time-consuming characteristics of the retraining method and domain restriction limitation of the transfer learning method), we propose a novel network architecture, namely brick assembly network (BAN). We release the implementation of our network architecture (Our scripts are available at https://github.com/canboy123/ban)

Related Work
Preliminaries
Brick Assembly Network
Pseudo-Code of the Brick Assembly Network
Parametric Characteristic Layer
Experiment Settings
Experiment Results and Discussion
Single Dataset
Multiple Datasets
Summary
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call