Abstract

Abstract Many Chinese characters have more than one form of writing owing to complex nature of creation and long evolvement history of writing. Most existing Chinese dictionaries list these variant forms but do not explain in a systematic way why a specific character is a variant form of another, and only list a few older key bibliographies, many of which are themselves dictionaries of various forms. In this article we present a new theory and practice of how to determine whether a Chinese character is a variant of another, and show how we can deduce a dictionary of variant characters automatically from a corpus of ancient Chinese texts totaling 2.3 billion characters with artificial intelligence techniques. Results show that in over 74,000 instances of identified variant character groups, more than 20,000 new instances are found by our algorithm. We have then compiled all the instances into a dictionary and call it Dictionary of Chinese Variant Words (異體字詞典, Yiti Zi Cidian). The key insight of our theory is to find synonymous words with variant characters. The dictionary has already been put online for several years and everyone can freely access and edit it like the way they do on Wikipedia.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call