Statistical learning and analyses of Chinese ancient books for information retrieval

Min Zhang Min Zhang,Zhe Jiang Zhe Jiang,Ke Huang Ke Huang,Shao-Ping Ma Shao-Ping Ma

doi:10.1109/icsmc.2001.973025

Abstract

The technique of full text retrieval for modern Chinese has been studied for a long time, but the same cannot be said for ancient Chinese books, especially in China. This paper tries to find the characteristics of Chinese ancient books which can be used for information retrieval. Statistical analysis was carried out on ancient Chinese books of over 35,000,000 words, including most of the works in common use. Based on these experiments some characteristics of ancient Chinese works are analyzed and compared with modern Chinese, including the basic unit of ancient works, the proportion of double character words, sentence length, and the field dependency of ancient Chinese works. We then give conclusions on ancient Chinese which is useful for information retrieval, especially when building inverted indexes and selecting the index unit. Depending on the conclusion, a full-text retrieval system for ancient Chinese books has been designed and realized. It shows that statistical learning and analyses are a great help in ancient Chinese information retrieval.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Statistical learning and analyses of Chinese ancient books for information retrieval

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Analysis on Current Situation of TCM Ancient books Classification
...
Traditional Chinese Medicine | VOL. 31
, et. al. ...
30 Jan 2009
Traditional Chinese Medicine | VOL. 31

Textual research on lost ancient Chinese medical books in Bencao Tujing
F Wan ... Z Ge
Zhonghua yi shi za zhi (Beijing, China : 1980) | VOL. 51
F Wan, et. al.F Wan ... Z Ge
28 Jan 2021
Zhonghua yi shi za zhi (Beijing, China : 1980) | VOL. 51

元、明兩朝漢文典籍與圖籍中對“德國”的敘述

-

01 Sep 2007
01 Sep 2007

15 Years of Implementation of the Chinese Ancient Books Preservation Plan (2007-2022)
Veronika Vinogrodskaya
Problemy dalnego vostoka | VOL. -
Veronika VinogrodskayaVeronika Vinogrodskaya
01 Jan 2023
15 Years of Implementation of the Chinese Ancient Books Preservation Plan (2007-2022)
Veronika Vinogrodskaya

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Statistical learning and analyses of Chinese ancient books for information retrieval

Abstract

Talk to us

Similar Papers