Extracting and digitizing drug attributes from medical literature is the first step to build a knowledge computing system for precision disease treatment. In order to build a cardiovascular drug knowledge base, this paper proposes a multi-label text classification method for cardiovascular drug attributes from the Chinese drug guideline. The drug attributes are characterized by a BERT pre-trained model, and a dual-feature extraction structure is proposed based on the BiGRU neural network to capture high-level semantic information. Label categorization of cardiovascular drug attributes, such as indications and mode of administration, is accomplished. The F1 score of 0.8431 was obtained using 5-fold cross-validation. Comparing KNN and Naïve bayes, and conducting CNN and BiGRU control experiments on the basis of Word2Vec characterization of medication guidelines, the proposed multi-label text classification method is effective and the F1 value is significantly improved. Proved by analysis of ablation and crossover experiments, the proposed method can achieve a high accuracy rate averaged at 0.8339.
Read full abstract