Abstract

Deep neural networks are static by nature, meaning they use a single set of parameters to process each data sample. However, for more complex and larger systems, for which samples can be obtained under a large variety of circumstances, a need for more dynamic networks that adapt to these variations seems apparent. To this end, the concept of context information and its integration in deep learning is considered. Contrary to current practices, that treat all modalities as equally informative to the decision process, contextual and feature information is considered to be vastly different in nature. A set of definitions and subsequent arguments are given in order to provide the necessary clarification regarding the interpretation of context with the purpose of using all information in a maximally efficient way. From this interpretation, a problem statement of context aware deep learning is constructed and consequently linked to its multiple model and transfer learning solutions. These solutions, however, are in-efficient since samples are spread across models. Based on the existence of this multiple model solution, a new approach, which integrates contextual information directly into a single context-dependent model, is proposed. This single model uses weight matrices that are dynamically assembled based on the contextual information to process each data sample individually. This allows for the single model to consume and learn from all samples. The corresponding training routine is constructed and evaluated on multiple benchmark problems. We start with an artificially generated problem on which the methods’ ability to model multiple linear classification problems concurrently is confirmed. Next, both a time series forecasting and image classification dataset are used for evaluation. Evaluations of our proposed method are done and compared to standard context aware implementations based on concatenation and gating. These standard methods implement context information by adding additional parameters in order to try modeling all interactions between the context information and the samples. However, our proposed approach integrates the contextual information directly into the network weights, allowing parameter efficient modeling of dynamic contextual behavior. In both cases the proposed solution outperforms its standard counterparts with significant margins in both evaluation metrics and parameter efficiency. Specifically, a mean absolute error improvement of eleven standard deviations and an eight percent increase in classification accuracy, for the forecasting and image classification problems respectively, is observed, showing the potential of our approach.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call