Abstract

In this paper, we make use of the 2-dimensional data obtained through t-Stochastic Neighborhood Embedding (t-SNE) when applied on high-dimensional data of Urdu handwritten characters and numerals. The instances of the dataset used for experimental work are classified in multiple classes depending on the shape similarity. We performed three tasks in a disciplined order; namely, (i) we generated a state-of-the-art dataset of both the Urdu handwritten characters and numerals by inviting a number of native Urdu participants from different social and academic groups, since there is no publicly available dataset of such type till date, then (ii) applied classical approaches of dimensionality reduction and data visualization like Principal Component Analysis (PCA), Autoencoders (AE) in comparison with t-Stochastic Neighborhood Embedding (t-SNE), and (iii) used the reduced dimensions obtained through PCA, AE, and t-SNE for recognition of Urdu handwritten characters and numerals using a deep network like Convolution Neural Network (CNN). The accuracy achieved in recognition of Urdu characters and numerals among the approaches for the same task is found to be much better. The novelty lies in the fact that the resulting reduced dimensions are used for the first time for the recognition of Urdu handwritten text at the character level instead of using the whole multidimensional data. This results in consuming less computation time with the same accuracy when compared with processing time consumed by recognition approaches applied to other datasets for the same task using the whole data.

Highlights

  • Data visualization deals with presenting the data in some visual context to make it trivial for the human to understand the nature of the data [1]

  • It has a complex effect on the resulting visualizations, as explained in the original t-Stochastic Neighborhood Embedding (t-Stochastic Neighbor Embedding (SNE)) paper (Maaten and Hinton [9]). e selection of an optimal value of perplexity is of significant importance; one must have to take care, since it can be achieved only by producing multiple visualizations with varying perplexity values. erefore, in this paper, we chose the best result based on the quality of visualization

  • We make use of a deep convolutional neural network (CNN) model with an output layer generating the output on feature mapping in order to recognize the Urdu handwritten characters

Read more

Summary

Introduction

Data visualization deals with presenting the data in some visual context to make it trivial for the human to understand the nature of the data [1]. While analyzing the high-dimensional data, almost every other researcher is interested in finding the optimal number of dimensions (or features) in order to apply any appropriate classifier for giving better performance (Nguyen and Holmes [3]; Song et al [4]; ur Rehman et al [5]). E term “dimensionality” refers to the number of variables, characteristics, or features in which most of the datasets exist in the field of data science nowadays These dimensions are represented as columns, and the main purpose is to reduce this number of columns. At is the reason, the dimensionality reduction approaches have become of vital importance It helps in finding the patterns, if they exist, in the data set prior to applying any clustering or classification approach by reducing the model’s complexity avoiding the overfitting.

Review of the Approaches Used in Dimensionality Reduction
Our Motivation
Dataset Preparation
Experimental Results of Dimensionality Reduction Approaches
Recognition of Urdu Handwritten Characters Using Deep Network
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call