Abstract
Transfer parsing has been used for developing dependency parsers for languages with no treebank by using transfer from treebanks of other languages (source languages). In delexicalized transfer, parsed words are replaced by their part-of-speech tags. Transfer parsing may not work well if a language does not follow uniform syntactic structure with respect to its different constituent patterns. Earlier work has used information derived from linguistic databases to transform a source language treebank to reduce the syntactic differences between the source and the target languages. We propose a transformation method where a source language pattern is transformed stochastically to one of the multiple possible patterns followed in the target language. The transformed source language treebank can be used to train a delexicalized parser in the target language. We show that this method significantly improves the average performance of single-source delexicalized transfer parsers. We also show that, in the multi-source settings, parsers trained using a concatenation of transformed source language treebanks work better when a subset of the source language treebanks is used rather than concatenating all of them or only one. However, the problem of selecting the subset of treebanks whose combination gives the best-performing parser from the set of all the available treebanks is hard. We propose a greedy selection heuristic based on the labelled attachment scores of the corresponding single-source parsers trained using the treebanks after transformation.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have