Abstract

Objective: To construct a natural language processing (NLP) system focused on named entity recognition (NER) and semantic relation extraction (RE) of ancient Chinese medical books, it supports annotated corpora management and semantic knowledge retrieval. Methods: We integrate the 47 ontologies and terminologies as the terminology database. After that, we trained a preprocessing NER model using spaCy and used a hybrid approach combining automated annotation and manual review to annotate corpora of ancient Chinese medical books. Results: The semantic annotation system of Chinese ancient texts named traditional Chinese medicine - semantic annotation system (TCM-SAS), was constructed based on ontologies and terminologies. Annotations and knowledge retrieval of TCM's ancient texts were realized. Conclusion: TCM-SAS is a user-friendly semantic annotation system for ancient Chinese medical books that includes a large-scale manual annotation of TCM literature and semantic knowledge of TCM. TCM-SAS could provide users with two modes of automatic and manual NER and RE for ancient Chinese texts, as well as annotated entity and corpora management. Support the discovery of new knowledge from ancient Chinese medical texts in the future.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call