Abstract
Oracle bone script is an ancient Chinese writing system engraved on turtle shells and animal bones, serving as a valuable resource for interpreting ancient culture, history, and language. We introduce the Oracle-MNIST dataset, comprising of 28 × 28 grayscale images of 30,222 ancient characters from 10 categories, designed for benchmarking pattern classification, with particular challenges related to image noise and distortion. The training set totally consists of 27,222 images, and the test set contains 300 images per class. Oracle-MNIST follows the same data format with the original MNIST dataset, enabling direct compatibility with all existing classifiers and systems, but it constitutes a more challenging classification task than MNIST. The images of ancient characters suffer from (1) extremely serious and unique noises caused by three-thousand years of burial and aging and (2) dramatically variant writing styles by ancient Chinese, which all make them realistic for machine learning research.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.