Automatically Inducing a Part-of-Speech Tagger by Projecting from Multiple Source Languages Across Aligned Corpora

Victoria Fossum,Steven Abney

doi:10.1007/11562214_75

Abstract

We implement a variant of the algorithm described by Yarowsky and Ngai in [21] to induce an HMM POS tagger for an arbitrary target language using only an existing POS tagger for a source language and an unannotated parallel corpus between the source and target languages. We extend this work by projecting from multiple source languages onto a single target language. We hypothesize that systematic transfer errors from differing source languages will cancel out, improving the quality of bootstrapped resources in the target language. Our experiments confirm the hypothesis. Each experiment compares three cases: (a) source data comes from a single language A, (b) source data comes from a single language B, and (c) source data comes from both A and B, but half as much from each. Apart from the source language, other conditions are held constant in all three cases – including the total amount of source data used. The null hypothesis is that performance in the mixed case would be an average of performance in the single-language cases, but in fact, mixed-case performance always exceeds the maximum of the single-language cases. We observed this effect in all six experiments we ran, involving three different source-language pairs and two different target languages.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Automatically Inducing a Part-of-Speech Tagger by Projecting from Multiple Source Languages Across Aligned Corpora

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Cross-language phoneme mapping for phonetic search keyword spotting using multiple source languages
Ella Tetariy ... Ruthi Alon-Lavi
Artificial Intelligence Research | VOL. 5
Ella Tetariy, et. al.Ella Tetariy ... Ruthi Alon-Lavi
03 Feb 2016
Artificial Intelligence Research | VOL. 5

Regularized subspace Gaussian mixture models for cross-lingual speech recognition
Liang Lu ... Steve Renals
-
Liang Lu, et. al.Liang Lu ... Steve Renals
01 Dec 2011
01 Dec 2011

Multi-Source Cross-Lingual Model Transfer: Learning What to Share
Xilun Chen ... Wei Wang
-
Xilun Chen, et. al.Xilun Chen ... Wei Wang
01 Jan 2019
01 Jan 2019

Zero-Shot Cross-Lingual Opinion Target Extraction.
Soufian Jebbara ... Philipp Cimiano
-
Soufian Jebbara, et. al.Soufian Jebbara ... Philipp Cimiano
01 Jan 2019
01 Jan 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Automatically Inducing a Part-of-Speech Tagger by Projecting from Multiple Source Languages Across Aligned Corpora

Abstract

Talk to us

Similar Papers