Performing arithmetic using a neural network trained on images of digit permutation pairs

Marcus D Bloice,Andreas Holzinger,Peter M Roth

doi:10.1007/s10844-021-00662-9

Marcus D Bloice, Andreas Holzinger + Show 1 more

Open Access

https://doi.org/10.1007/s10844-021-00662-9

Copy DOI

Abstract

In this paper, a neural network is trained to perform simple arithmetic using images of concatenated handwritten digit pairs. A convolutional neural network was trained with images consisting of two side-by-side handwritten digits, where the image’s label is the summation of the two digits contained in the combined image. Crucially, the network was tested on permutation pairs that were not present during training in an effort to see if the network could learn the task of addition, as opposed to simply mapping images to labels. A dataset was generated for all possible permutation pairs of length 2 for the digits 0–9 using MNIST as a basis for the images, with one thousand samples generated for each permutation pair. For testing the network, samples generated from previously unseen permutation pairs were fed into the trained network, and its predictions measured. Results were encouraging, with the network achieving an accuracy of over 90% on some permutation train/test splits. This suggests that the network learned at first digit recognition, and subsequently the further task of addition based on the two recognised digits. As far as the authors are aware, no previous work has concentrated on learning a mathematical operation in this way. This paper is an attempt to demonstrate that a network can learn more than a direct mapping from image to label, but is learning to analyse two separate regions of an image and combining what was recognised to produce the final output label.

Highlights

The aim of this study is to attempt to find experimental evidence that would suggest that a network can be trained to perform the task of addition, when supplied with image data containing two digits that should be summed
Errors presented here could be the result of misclassifications of the images themselves, this is unlikely given that LeNet5 can classify MNIST with an accuracy of approximately 99%, albeit as a classification task and within a much different experimental setting, and having been trained as a regression problem rather than a classification problem
We have presented a neural network that achieves good results at the task of addition when trained with images of side-by-side digits labelled with their summations, and tested with digit combination pairs it has never seen

Summary

Introduction

The aim of this study is to attempt to find experimental evidence that would suggest that a network can be trained to perform the task of addition, when supplied with image data containing two digits that should be summed. As opposed to most work, the goal of this study was not to recognise digits, extract their numerical values from the images, and perform (after the network’s character recognition procedure) some mathematical function on the numerical values. The task of this experiment was to learn if the network could learn the logical task of the mathematical operation itself using an end-to-end approach. The work presented here outputs its predictions as a real number Their approach used numbers of longer lengths and were able to generate many thousands of training samples, despite not using hand written digits. Some tasks were not learnable in an end-to-end manner, for example the addition of Roman numerals, but were learnable once broken into separate sub-tasks: first perceptual character recognition and the cognitive arithmetic sub-task

Objectives

Results

Conclusion