Abstract

Deep Learning architectures can develop feature representations and classification models in an integrated way during training. This joint learning process requires large networks with many parameters, and it is successful when a large amount of training data is available. Instead of making the learner develop its entire understanding of the world from scratch from the input examples, the injection of prior knowledge into the learner seems to be a principled way to reduce the amount of require training data, as the learner does not need to induce the rules from the data. This paper presents a general framework to integrate arbitrary prior knowledge into learning. The domain knowledge is provided as a collection of first-order logic (FOL) clauses, where each task to be learned corresponds to a predicate in the knowledge base. The logic statements are translated into a set of differentiable constraints, which can be integrated into the learning process to distill the knowledge into the network, or used during inference to enforce the consistency of the predictions with the prior knowledge. The experimental results have been carried out on multiple image datasets and show that the integration of the prior knowledge boosts the accuracy of several state-of-the-art deep architectures on image classification tasks.

Highlights

  • Deep Learning [1,2] has been a break-through for several classification and recognition problems

  • The methodology presented in this paper finds its roots and inspiration in the work carried out by the Statistical Relational Learning (SRL) community, which has proposed various probabilistic logic frameworks to integrate logic inference and probability like Markov Logic Networks (MLN) [7], Hidden Markov Logic Networks [8], Probabilistic Soft Logic [9] and Problog [10]

  • Several ablation studies have been performed to show how the prior knowledge can help in conditions where the training data is scarce

Read more

Summary

Introduction

Deep Learning [1,2] has been a break-through for several classification and recognition problems. This success has been possible because of the availability of large amounts of training data, increased chip processing abilities, and the availability of well designed and general software environments. Deep neural networks have major disadvantages like their heavy dependency on a large amount of labeled data, which is required to develop powerful feature representations. Unsupervised data has been playing a minor role so far in the development of deep learning frameworks, and it has been only used to drive a proper initialization of the weights in a pre-training phase [4,5]. It is difficult and labor intensive to manually annotate huge datasets in the era of big

Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call