Abstract

Current state of the art image classifiers predict a single class label of an image. However, in many industry settings such as online shopping, images belong to a class hierarchy where the first level represents the coarse grained or the most abstract class with subsequent levels representing the more specific classes. We propose a novel hierarchical image classification model, Condition-CNN, which addresses some of the shortcomings of the branching convolutional neural network in terms of training time and fine-grained accuracy. It applies the Teacher Forcing training algorithm, where the actual class labels of the higher level classes rather than the predicted labels are used to train the lower level branches. The technique also prevents error propagation, and thereby, reduces the training time. Besides learning the image features for each level of classes, Condition-CNN also learns the relationship between different levels of classes as conditional probabilities, which is used to estimate class predictions during scoring. By feeding the estimated higher-level class predictions as priors to the lower-level class prediction, Condition-CNN achieves a superior prediction accuracy while requiring fewer trainable parameters compared to the baseline CNN models. The validation results of Condition-CNN using the Kaggle Fashion Product Images data set demonstrate a prediction accuracy of 99.8%, 98.1%, and 91.0% for Level 1, 2 and 3 classes respectively, which are greater than that of B-CNN and other baseline CNN models. Moreover, Condition-CNN used only 77.58% of the total number of trainable parameters as that of B-CNN.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call