Abstract
In this paper, we created an automatic quanticized traditional Chinese medicine (TCM) term network with the measurement of cosine distance. After scanning over the corpus, we got a set of word vectors whose relationships could be measured. After clustering, we obtained a three-level network as a category tree. Leaves stand for different types of words and we got clusters like herbs, diseases, theories of medicine etc. Of all categories, we selected words nearest to the center of each cluster and invited our experts to evaluate whether a word is a correct uncollected TCM term and got a new word extraction rate of around 70%. Our network was almost completely machine-generated so that it is much more efficient and might lead us to several new approaches of TCM with the knowledge from our network.
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have