A scarce dataset for ancient Arabic handwritten text recognition

Rayyan Najam,Safiullah Faizullah

doi:10.1016/j.dib.2024.110813

Abstract

Developing Deep Learning Optical Character Recognition is an active area of research, where models based on deep neural networks are trained on data to eventually extract text within an image. Even though many advances are currently being made in this area in general, the Arabic OCR domain notably lacks a dataset for ancient manuscripts. Here, we fill this gap by providing both the image and textual ground truth for a collection of ancient Arabic manuscripts. This scarce dataset is collected from the central library of the Islamic University of Madinah, and it encompasses rich text spanning different geographies across centuries. Specifically, eight ancient books with a total of forty pages, both images and text, transcribed by the experts, are present in this dataset. Particularly, this dataset holds a significant value due to the unavailability of such data publicly, which conspicuously contributes to the deep learning models development/augmenting, validation, testing, and generalization by researchers and practitioners, both for the tasks of Arabic OCR and Arabic text correction.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A scarce dataset for ancient Arabic handwritten text recognition

Abstract

Talk to us

Similar Papers

More From: Data in Brief

Lead the way for us

Journal: Data in Brief	Publication Date: Aug 8, 2024
License type: cc-by

Similar Papers

Arabic Text Recognition and Machine Translation
Ihab Alkhoury
-
Ihab AlkhouryIhab Alkhoury
13 Jul 2015
13 Jul 2015

Cursive-Text: A Comprehensive Dataset for End-to-End Urdu Text Recognition in Natural Scene Images.
Asghar Ali Chandio ... Mehwish Leghari
Data in Brief | VOL. 31
Asghar Ali Chandio, et. al.Asghar Ali Chandio ... Mehwish Leghari
21 May 2020
Data in Brief | VOL. 31

Arabic Scene Text Recognition in the Deep Learning Era: Analysis on a Novel Dataset
Heba Hassan ... Ahmed El-Mahdy
IEEE Access | VOL. 9
Heba Hassan, et. al.Heba Hassan ... Ahmed El-Mahdy
01 Jan 2020
IEEE Access | VOL. 9

Open Vocabulary Recognition of Offline Arabic Handwriting Text Based on Deep Learning
Zouhaira Noubigh ... Anis Mezghani
-
Zouhaira Noubigh, et. al.Zouhaira Noubigh ... Anis Mezghani
01 Jan 2020
01 Jan 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A scarce dataset for ancient Arabic handwritten text recognition

Abstract

Talk to us

Similar Papers

More From: Data in Brief