Abstract

Monolithic neural networks and end-to-end training have become the dominating trend in the field of deep learning, but the steady increase in complexity and training costs has raised concerns about the effectiveness and efficiency of this approach. We propose modular training as an alternative strategy for building modular neural networks by composing neural modules that can be trained independently and then kept for future use. We analyse the requirements and challenges regarding modularity and compositionality and, with that information in hand, we provide a detailed design and implementation guideline. We show experimental results of applying this modular approach to a Visual Question Answering (VQA) task parting from a previously published modular network and we evaluate its impact on the final performance, with respect to a baseline trained end-to-end. We also perform compositionality tests on CLEVR.

Highlights

  • Deep learning has been demonstrated to be a powerful part of machine learning, enabling the automatic discovery of complex patterns in data and as a result finding solution to problems that had previously been considered very difficult to solve or computationally unfeasible

  • We provide a definition of modularity and compositionality as they should be understood within this article

  • We have identified five main types of cases regarding dependencies that may appear during modular training

Read more

Summary

Introduction

Deep learning has been demonstrated to be a powerful part of machine learning, enabling the automatic discovery of complex patterns in data and as a result finding solution to problems that had previously been considered very difficult to solve or computationally unfeasible. As the research community teases the limits of this approach, the predominant trend is to design and train new monolithic neural networks for each new task, conducting the training in an end-to-end fashion. The steady increase in complexity of these tasks has made the amount of resources invested in training neural networks a growing concern. It points out that the training cost represents only a little fraction of the total development cost, as the greater part falls into hyperparameter optimization. This damages the environment, and imposes access and creativity barriers to underfunded researchers. Prioritizing the computational efficiency over brute performance has often been recommended

Objectives
Methods
Findings
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call