An empirical approach to syntax learning

Sven Naumann,Jürgen Schrepp

doi:10.1007/978-3-642-77809-4_22

Abstract

This paper describes the outline of a system which is designed to infer a grammar from a collection of linguistic data (corpus). An incremental learning algorithm is used to produce a sequence of grammars which approximates the target grammar of the data provided.In each step, a small set of sentences is selected and analysed by a special parser which produces partial structural descriptions for sentences not covered by the actual grammar. The sentence which minimizes the inductive leap for the learner is selected. For this sentence several hypotheses for completing its partial structural description are formulated and evaluated. The “best” hypothesis is then used to infer a new grammar. This process is continued until the corpus is completely covered by the grammar.Keywordsmachine learning of natural languageparsinginductive inferenceSchlüsselwörtermaschinelles Lernen natürlicher SpracheParsinginduktives Schließen

Full Text