Abstract

The Swedish prescribed drug register contains dose instructions as written by the physician. A challenge is to convert the text into a number of doses per day which can be used to calculate for example duration of treatment. The objective of this study is to compare algorithms for named entity recognition to extract dosage per day. Two sequence models, Hidden Markov Model (HMM) and Conditional Random Fields (CRF), were used to predict label sequences. The HMM and CRF were compared using different measures of prediction: precision, recall, F-score and accuracy. We also evaluated how prediction was effected by including more labels and features; for CRF models we used 12 labels for both models with 2 and 11 feature types respectively, for HMM models we used 12, 15 and 18 labels respectively. Using the predicted labels, a rule-based algorithm was used to predict dosage per day. Prediction of dosage per day was evaluated using accuracy. Label prediction: As expected, increasing the number of labels/features increased the F-score. The CRF model with 11 feature types had a F-score of 0.989 compared to 0.972 using two feature types. The HMM model with 15 and 18 labels both achieved a F-score of 0.986 compared to 0.966 using 12 labels. In terms of precision and recall the performance of the CRF and HMM varied. Dosage prediction: The CRF model with 11 feature types achieved 97.2% accuracy. The HMM with 15 labels achieved a higher accuracy than with 18 labels (95.7% versus 95.5%). The CRF has the highest accuracy in label and dosage per day prediction. The HMM model also has comparably high accuracy but is generally lower than the CRF. We recommend CRF over HMM for named entity recognition on prescription text; it is time efficient and predicts dosage per day with high accuracy.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call