Abstract

Unsupervised dependency parsing is acquiring great relevance in the area of Natural Language Processing due to the increasing number of utterances that become available on the Internet. Most current works are based on Depen- dency Model with Valence (DMV) (12) or Extended Valence Grammars (EVGs) (11), in both cases the dependencies between words are modeled by using a fixed structure of automata. We present a framework for unsupervised induction of dependency structures based on CYK parsing that uses a simple rewriting tech- niques of the training material. Our model is implemented by means of a k-best CYK parser, an inductor for Probabilistic Bilexical Grammars (PBGs) (8) and a simple technique that rewrites the treebank from k trees with their probabilities. An important contribution of our work is that the framework accepts any existing algorithm for automata induction making the automata structure fully modifiable. Our experiments showed that, it is the training size that influences parameteriza- tion in a predictable manner. Such flexibility produced good performance results in 8 different languages, in some cases comparable to the state-of-the-art ones.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call