Learning syntactic rules and tags with genetic algorithms for information retrieval and filtering: An empirical basis for grammatical rules

Robert M Losee

doi:10.1016/s0306-4573(96)85005-9

Abstract

The grammars of natural languages may be learned by using genetic algorithms that reproduce and mutate grammatical rules and part-of-speech tags, improving the quality of later generations of grammatical components. Syntactic rules are randomly generated and then evolve; those rules resulting in improved parsing and occasionally improved retrieval and filtering performance are allowed to further propagate. The LUST system learns the characteristics of the language or sublanguage used in document abstracts by learning from the document rankings obtained from the parsed abstracts. Unlike the application of traditional linguistic rules to retrieval and filtering applications, LUST develops grammatical structures and tags without the prior imposition of some common grammatical assumptions (e.g. part-of-speech assumptions), producing grammars that are empirically based and are optimized for this particular application.

Full Text