Abstract

Objective: With using natural language processing (NLP) technology to analyze and process the text of “Treatise on Febrile Diseases (TFDs)” for the sake of finding important information, this paper attempts to apply NLP in the field of text mining of traditional Chinese medicine (TCM) literature. Materials and Methods: Based on the Python language, the experiment invoked the NLP toolkit such as Jieba, nltk, gensim, and sklearn library, and combined with Excel and Word software. The text of “TFDs” was sequentially cleaned, segmented, and moved the stopped words, and then implementing word frequency statistics and analysis, keyword extraction, named entity recognition (NER) and other operations, finally calculating text similarity. Results: Jieba can accurately identify the herbal name in “TFDs.” Word frequency statistics based on the word segmentation found that “warm therapy” is an important treatment of “TFDs.” Guizhi decoction is the main prescription, and five core decoctions are identified. Keyword extraction based on the term “frequency-inverse document frequency” algorithm is ideal. The accuracy of NER in “TFDs” is about 86%; latent semantic indexing model calculating the similarity, “Understanding of Synopsis of Golden Chamber (SGC)” is much more similar with “SGC” than with “TFDs.” The results meet expectation. Conclusions: It lays a research foundation for applying NLP to the field of text mining of unstructured TCM literature. With the combination of deep learning technology, NLP as an important branch of artificial intelligence will have broader application prospective in the field of text mining in TCM literature and construction of TCM knowledge graph as well as TCM knowledge services.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.