Abstract

News articles in different languages on the same event are invaluable for analyzing standpoints and viewpoints in different countries. The major challenge to identify such closely related bilingual news articles is how to take full advantage of various information sources such as length, translation equivalence and publishing date. Accordingly, we propose a discriminative model for bilingual news alignment, which is capable of incorporating arbitrary information sources as features. Chinese word segmentation, Part-of-speech tagging and Named Entity Recognition technologies are used to calculate the semantic similarities between words or text as feature values. The feature weights are optimized using the minimum error rate training algorithm to directly correlate training objective to evaluation metric. Experiments on Chinese-English data show that our method significantly outperforms two strong baseline systems by 12.7% and 2.5%, respectively.KeywordsBilingual News AlignmentMinimum Error Rate TrainingDiscriminative Model

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call