Abstract

This paper proposes a new criterion called minimum tag error (MTE) for discriminative training of conditional random fields (CRFs). The new criterion, which is a smoothed approximation to the sentence labeling error, aims to maximize an average of transcription tagging accuracies of all possible sentences, weighted by their probabilities. Corpora from the second international Chinese word segmentation bakeoff (Bakeoff 2005) are used to test the effectiveness of this new training criterion. The experimental results have demonstrated that the proposed minimum tag error criterion can reliably improve the initial performance of supervised conditional random fields. In particular, the recall rate of out-of-vocabulary words ( R oov) is significantly improved compared with that obtained using standard conditional random fields. Furthermore, the new training method has the advantage of robustness to segmentation across all datasets.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call