Abstract

In many applications, it is useful to extract structured data from sections of unstructured text. A common approach is to use pattern matching (e.g., regular expressions) or more general grammar-based techniques. In cases where exact templates or grammar fragments are not known, it is possible to use machine learning approaches, based on words or n-grams, to identify the structured data. This is generally a two-stage (train/use) process that cannot easily cope with incremental extensions of the training set. In this paper, we combine a fuzzy grammar-based approach with incremental learning. This enables a set of grammar fragments to evolve incrementally, each time a new example is given, while guaranteeing that it can parse previously seen examples. We propose a novel measure of overlap between fuzzy grammar fragments that can also be used to determine the degree to which a string is parsed by a grammar fragment. This measure of overlap allows us to compare the range of two fuzzy grammar fragments (i.e., to estimate and compare the sets of strings that fuzzily conform to each grammar) without explicitly parsing any strings. A simple application shows the method's validity.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call