Optical character recognition and the smart ancient script database

Zhiji Liu

doi:10.1177/2513850220967758

Abstract

The initial success of optical character recognition (OCR) for ancient scripts has opened the floodgates for ‘smart’ ancient script research. ‘Smart’ ancient script research requires the support of a smart ancient script database. In order to compile the big data necessary for this smart research, smart ancient script database software must be able to recognize all aspects and all levels of all ancient script materials. Therefore, in addition to the integration of OCR functionality into this software, the other primary imperative moving forward is to innovate a new digitized ancient script data system, one that includes full-scale supplementation to include all available materials, as well as newly inputted image data. This data must include variant graphic forms, variant written forms, handwriting, graphic components, calligraphic styles, and other of the inexhaustible different variations in script construction. This database must contain a multi-level framework with an annotated arrangement of the fullest range of meanings for words within linguistic context. It must also contain a digitally integrated multiple-path indexed arrangement of the important paleographical interpretations in the field. Our strategy for the construction of this smart ancient script database is to push forward with both algorithm writing and data input work simultaneously and in mutual support, following an open-sourced community supported model, making this project an exercise in interdisciplinary collaboration within the paleography community.

Full Text