A dataset of oracle characters for benchmarking machine learning algorithms

Mei Wang,Weihong Deng

doi:10.1038/s41597-024-02933-w

Abstract

Oracle bone script is an ancient Chinese writing system engraved on turtle shells and animal bones, serving as a valuable resource for interpreting ancient culture, history, and language. We introduce the Oracle-MNIST dataset, comprising of 28 × 28 grayscale images of 30,222 ancient characters from 10 categories, designed for benchmarking pattern classification, with particular challenges related to image noise and distortion. The training set totally consists of 27,222 images, and the test set contains 300 images per class. Oracle-MNIST follows the same data format with the original MNIST dataset, enabling direct compatibility with all existing classifiers and systems, but it constitutes a more challenging classification task than MNIST. The images of ancient characters suffer from (1) extremely serious and unique noises caused by three-thousand years of burial and aging and (2) dramatically variant writing styles by ancient Chinese, which all make them realistic for machine learning research.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Scientific Data	Publication Date: Jan 18, 2024
Citations: 1	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

A dataset of oracle characters for benchmarking machine learning algorithms

Abstract

Talk to us

Similar Papers

More From: Scientific Data

Lead the way for us

Similar Papers

A Study on the Creative Approach of Luo Zhenyu's Oracle Bone Calligraphy
Zongyang Wu
Pacific International Journal | VOL. 6
Zongyang WuZongyang Wu
01 Jul 2023
Pacific International Journal | VOL. 6

The Etymological Sense of Truth in Early China
Youngsam Ha
The International Journal of Chinese Character Studies | VOL. 1
Youngsam HaYoungsam Ha
30 Dec 2015
The International Journal of Chinese Character Studies | VOL. 1

Inscriptions on Bones and Tortoise Carapaces and Digital Age - The Digitization Prospect of Ancient Characteristics (Hieroglyphics)

Indian Journal of Science and Technology | VOL. 9

22 Nov 2016
Indian Journal of Science and Technology | VOL. 9

IsOBS: An Information System for Oracle Bone Script
Xu Han ... Keyue Qiu
-
Xu Han, et. al.Xu Han ... Keyue Qiu
01 Jan 2020
01 Jan 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A dataset of oracle characters for benchmarking machine learning algorithms

Abstract

Talk to us

Similar Papers

More From: Scientific Data