Knowledge Discovery with Second-Order Relations

Rattikorn Hewett,John Leuchner

doi:10.1007/s101150200014

Abstract

This paper presents an induction technique that discovers a set of classification rules, from a set of examples, using second-order relations as a representational model. Second-order relations are database relations in which tuples have sets of atomic values as components. Using sets of values, which are interpreted as disjunctions, provides compact representations that facilitate efficient management and enhance comprehensibility. The second-order relational framework is based on theoretical foundations that link relational database theory, machine learning, and logic synthesis. The rule induction technique can be viewed as a second-order relation compression problem in which the original relation, representing training data, is transformed into a second-order relation with fewer tuples by merging tuples in ways that preserve consistency with the training data. This problem is closely related to two-level Boolean function minimization in logic synthesis. We describe a rule-mining system, SORCER, and compare its performance to two state-of-the-art classification systems: C4.5 and CBA. Experimental results based on the average of error rates ove 26 data sets show that SORCER, using a simple compression scheme, outperforms C4.5 and is competitive to CBA. Using a slightly more sophisticated compression scheme, SORCER outperforms both C4.5 and CBA.

Full Text