Relational knowledge discovery in a chinese character database

Jean-Daniel Zucker,Jean-Gabriel Ganascia,Isabelle Bournaud

doi:10.1080/088395198117712

Abstract

This article describes a novel application ofInductive Logic Programming (ILP) to the problem of data mining relational databases. The task addressed here consists in mining a relational database ofmore than 200,000 ground facts describing 6768 Chinese characters. Mining this relational database may be recast in an ILP setting, where the form ofthe association rules searched are represented as nondeterminate Horn clauses, a type of clause known for being computationally hard to learn. We have introduced a new kind of language bias, S-structural indeterminate clauses, which takes into account the meaning of part-of predicates that play a key role in the complexity oflearning in structural domains. The ILP algorithm REPART has been specifically developed to learn S-structural indeterminate clauses. Its efficiency lies in a particular change ofrepresentation, so as to enable one to use propositional learners. This article presents original results discovered by REPART that exemplify how ILP algorithms may not only scale up efficiently to large relational databases but also discover useful and computationally hard to learn patterns.

Full Text