Exploratory Analysis of MNIST Handwritten Digit for Machine Learning Modelling

Mohd Razif Shamsuddin,Shuzlina Abdul-Rahman,Azlinah Mohamed

doi:10.1007/978-981-13-3441-2_11

Abstract

This paper is an investigation about the MNIST dataset, which is a subset of the NIST data pool. The MNIST dataset contains handwritten digit images that is derived from a larger collection of NIST data which contains handwritten digits. All the images are formatted in 28 × 28 pixels value with grayscale format. MNIST is a handwritten digit images that has often been cited in many leading research and thus has become a benchmark for image recognition and machine learning studies. There have been many attempts by researchers in trying to identify the appropriate models and pre-processing methods to classify the MNIST dataset. However, very little attention has been given to compare binary and normalized pre-processed datasets and its effects on the performance of a model. Pre-processing results are then presented as input datasets for machine learning modelling. The trained models are validated with 4200 random test samples over four different models. Results have shown that the normalized image performed the best with Convolution Neural Network model at 99.4% accuracy.

Full Text