CArDIS: A Swedish Historical Handwritten Character and Word Dataset

Amir Yavariabdi,Turgay Celik,Johan Hall,Huseyin Kusetogullari,Shivani Thummanapally,Sakib Rijwan

doi:10.1109/access.2022.3175197

Abstract

This paper introduces a new publicly available image-based Swedish historical handwritten character and word dataset named Character Arkiv Digital Sweden (CArDIS) (https://cardisdataset.github. io/CARDIS/). The samples in CArDIS are collected from 64,084 Swedish historical documents written by several anonymous priests between 1800 and 1900. The dataset contains 116,000 Swedish alphabet images in RGB color space with 29 classes, whereas the word dataset contains 30,000 image samples of ten popular Swedish names as well as 1,000 region names in Sweden. To examine the performance of different machine learning classifiers on CArDIS dataset, three different experiments are conducted. In the first experiment, classifiers such as Support Vector Machine (SVM), Artificial Neural Networks (ANN), k-Nearest Neighbor (k-NN), Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), and Random Forest (RF) are trained on existing character datasets which are Extended Modified National Institute of Standards and Technology (EMNIST), IAM and CVL and tested on CArDIS dataset. In the second and third experiments, the same classifiers as well as two pre-trained VGG-16 and VGG-19 classifiers are trained and tested on CArDIS character and word datasets. The experiments show that the machine learning methods trained on existing handwritten character datasets struggle to recognize characters efficiently on the CArDIS dataset, proving that characters in the CArDIS contain unique features and characteristics. Moreover, in the last two experiments, the deep learning-based classifiers provide the best recognition rates.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Access	Publication Date: Jan 1, 2022
Citations: 6	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

CArDIS: A Swedish Historical Handwritten Character and Word Dataset

Abstract

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

Transformative Progress in Document Digitization: An In-Depth Exploration of Machine and Deep Learning Models for Character Recognition
Ali Benaissa ... Ahmad El Allaoui
Data and Metadata | VOL. 2
Ali Benaissa, et. al.Ali Benaissa ... Ahmad El Allaoui
27 Dec 2023
Data and Metadata | VOL. 2

Modern deep learning in bioinformatics.
Haoyang Li ... Shuye Tian
Journal of Molecular Cell Biology | VOL. 12
Haoyang Li, et. al.Haoyang Li ... Shuye Tian
23 Jun 2020
Journal of Molecular Cell Biology | VOL. 12

Fusion of visible and fluorescence imaging through deep neural network for color value prediction of pelletized red peppers.
Shaojin Ma ... Yuexiang Zhang
Journal of food science | VOL. -
Shaojin Ma, et. al.Shaojin Ma ... Yuexiang Zhang
11 Oct 2024
Journal of food science | VOL. -

Machine learning for air quality index (AQI) forecasting: shallow learning or deep learning?
Elham Kalantari ... Vahid Moosavi
Environmental science and pollution research international | VOL. -
Elham Kalantari, et. al.Elham Kalantari ... Vahid Moosavi
28 Oct 2024
Environmental science and pollution research international | VOL. -

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

CArDIS: A Swedish Historical Handwritten Character and Word Dataset

Abstract

Talk to us

Similar Papers

More From: IEEE Access