Evolving Neural Network Designs with Genetic Algorithms: Applications in Image Classification, NLP, and Reinforcement Learning

Rajesh Kumar Malviya,Machi Sathiri,Ravi Kumar Vankayalapti,Lakshminarayana Reddy Kothapalli Sondinti

doi:10.70179/grdjev09i120213

Rajesh Kumar Malviya, Machi Sathiri + Show 2 more

https://doi.org/10.70179/grdjev09i120213

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

A method of evolving deep learning architectures using genetic algorithms is presented. The method is a first step towards a low-cost evolutionary search for task-specific neural networks. We evolve task-specific model architectures optimized for fast execution and low error on several standard machine learning tasks: image classification, character-level language modeling, and solving the cart pole problem. We also introduce a simple variation of the method that is capable of evolving neural networks with recurrent connections of varying depth and length and show performance on a word-level language modeling task. The method is implemented in an open-source library. We hope that the ability to run an evolutionary search at this scale will make it possible for a wide audience to develop deep learning architectures that are specialized for a variety of tasks and to develop many interesting novel architectural features. A new method that uses evolutionary search to directly modify existing neural network architectures to perform a specific task is presented. We demonstrate that task-specific specialization of deep learning models can be useful in practice. We modify convolutional neural networks, residual networks, and an LSTM variant to perform various tasks, and show that specialized networks often perform better than models trained from scratch that have many more parameters and much larger training time. For example, on the object recognition task, a specialized model is built by training a base network to predict object position and then applying a series of genetic search operations to squeeze the network and fit new final layer weights to the output. The specialized model is 8 times faster and has 13% lower error, despite being 17 times smaller than a fully trained larger and slower network.

Full Text