Abstract
ABSTRACT ⎯ This letter introduces a new method to automatically acquire paraphrases using bilingual corpora. It utilizes the bilingual dependency relations obtained by projecting a monolingual dependency parse onto the other language’s sentence based on statistical alignment techniques. Since the proposed paraphrasing method can clearly disambiguate the sense of the original phrases using the bilingual context of dependency relations, it would be possible to obtain interchangeable paraphrases under a given context. Through experiments with parallel corpora of Korean and English language pairs, we demonstrate that our method effectively extracts paraphrases with high precision, achieving success rates of 94.3% and 84.6%, respectively, for Korean and English. Keywords ⎯ Paraphrase, bilingual dependency parsing, alignment, sense disambiguation, dependency relation. I. Introduction Approaches based on bilingual corpora are promising for the automatic acquisition of translation knowledge. Phrase-based statistical machine translation (SMT) models have advanced the state of the art in machine translation by expanding the basic unit from words to phrases [1], [2]. However, phrase-based SMT techniques suffer from data sparseness problems, such as unreliable translation probabilities of low-frequency phrases and low coverage, in that many phrases encountered at run-time are not observed in the training data. An alternative to these problems is to use paraphrases. In this study, we introduce a method of automatically acquiring paraphrases to smooth the translation parameters and to increase the coverage of translation knowledge. One previous approach
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.