Abstract

In this paper we propose an explicit computer model for learning natural language syntax based on Angluin's (1982) efficient induction algorithms, using a complete corpus of grammatical example sentences. We use these results to show how inductive inference methods may be applied to learn substantial, coherent subparts of at least one natural language — English — that are not susceptible to the kinds of learning envisioned in linguistic theory. As two concrete case studies, we show how to learn English auxiliary verb sequences (such as could be taking, will have been taking) and the sequences of articles and adjectives that appear before noun phrases (such as the very old big deer). Both systems can be acquired in a computationally feasible amount of time using either positive examples, or, in an incremental mode, with implicit negative examples (examples outside a finite corpus are considered to be negative examples). As far as we know, this is the first computer procedure that learns a full-scale range of noun subclasses and noun phrase structure. The generalizations and the time required for acquisition match our knowledge of child language acquisition for these two cases. More importantly, these results show that just where linguistic theories admit to highly irregular subportions, we can apply efficient automata-theoretic learning algorithms. Since the algorithm works only for fragments of language syntax, we do not believe that it suffices for all of language acquisition. Rather, we would claim that language acquisition is nonuniform and susceptible to a variety of acquisition strategies; this algorithm may be one these.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.