Abstract

Deep unification- (constraint-)based grammars are usually hand-crafted. Scaling such grammars from fragments to unrestricted text is time-consuming and expensive. This problem can be exacerbated in multilingual broad-coverage grammar development scenarios. Cahill et al. (2002, 2004) and O’Donovan et al. (2004) present an automatic f-structure annotation-based methodology to acquire broad-coverage, deep, Lexical-Functional Grammar (LFG) resources for English from the Penn-II Treebank. In this paper we show how this model can be adapted to a multilingual grammar development scenario to induce robust, wide-coverage, PCFG-based LFG approximations for German from the TIGER Treebank. We show how the architecture of LFG, in particular the distinction between c-structure and f-structure representations, facilitates multilingual, treebank-based unification grammar induction, allowing us to cross-linguistically reuse the lexical extraction and parsing modules from O’Donovan et al. (2004) and Cahill et al. (2004), respectively. We evaluate our grammars against the PARC 700 Dependency Bank (King et al., 2003), against dependency structures for 2000 held-out sentences from the TIGER Corpus as well as against a hand-crafted dependency gold standard for 100 TIGER trees. Currently, our resources achieve 81.79% f-score against the PARC 700, a 2.19% improvement over the best result reported for a hand-crafted grammar in Kaplan et al. (2004), 74.6% against the 2000 held-out TIGER dependency structures and 71.08% against the 100-sentence TIGER gold standard, with substantially improved coverage compared to hand-crafted resources. We have since applied our methodology to induce wide-coverage LFG resources for Chinese (Burke et al., 2004b) from the Penn Chinese Treebank (Xue et al., 2002) and for Spanish from the CAST3LB Treebank (Civit, 2003).

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.