Abstract

With the growing popularity of traditional Chinese medicine (TCM) in the world and the increasing awareness of intellectual property protection, the number of TCM patent application is growing year by year. TCM patents contain rich medical, legal, and economic information. Effective text mining of TCM patents is of great theoretical and practical significance (e.g., the R&D of new medicines, patent infringement litigation, and patent acquisition). Named entity recognition (NER) is a fundamental task in natural language processing and a crucial step before indepth analysis of TCM patent. In this paper, a method combining Bidirectional Long Short‐Term Memory neural network with Conditional Random Field (BiLSTM‐CRF) is proposed to automatically recognize entities of interest (i.e., herb names, disease names, symptoms, and therapeutic effects) from the abstract texts of TCM patents. By virtue of the capabilities of deep learning methods, the semantic information in the context can be learned without feature engineering. Experiments show that the BiLSTM‐CRF‐based method provides superior performance in comparison with various baseline methods.

Highlights

  • traditional Chinese medicine (TCM) has a long history and has been inherited for thousands of years

  • The category of herb names is labeled in the largest amount, and it is unlikely to be recognized as any other category; so, it makes sense that its recognition result outperforms other categories

  • Compared with bidirectional long short-term memory neural network (BiLSTM), precision, recall, and F1 value of the proposed model were improved by 0.98%, 0.61%, and 0.82%, respectively

Read more

Summary

Introduction

TCM has a long history and has been inherited for thousands of years It is becoming increasingly popular all over the world for its mild medicinal properties and impressive therapeutic effects, especially for certain chronic and intractable diseases. Since the R&D of TCM is a time-consuming and laborious process, if we can fully analyze the information in TCM patents, repeated medicine research will be largely avoided, R&D cycle will be shortened, and R&D costs will be saved. In this sense, the analysis of TCM patents is becoming a hot research topic. Before analyzing TCM patents, an essential step is to extract important named entities (e.g., herb names, disease names, symptoms, and therapeutic effects) from the TCM patent texts. These extracted entities can serve as the object of semantic extension in patent intelligent retrieval and be the input in calculating patent text similarity

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call