Logics and Statistics for Association Rules and Beyond

Petr Hájek,Jan Rauch

doi:10.1007/978-3-540-48247-5_80

Abstract

The aim of the tutorial is four-fold: 1 To present a very natural class of logical systems suitable for formalizing, generating and evaluating statements on dependences found in given data. The popular association rules form a particular, but by far not the only example. Technically, our logical systems are monadic observational predicate calculi, i.e. calculi with generalized quantifiers, only finite models and effective semantics. Be not shocked by these terms; no non-trivial knowledge of predicate logic will be assumed. Logic includes deductive properties, i.e. possibility to deduce truth of a sentence from other sentences already found true. Transparent deduction rules are very useful for systematic data mining. Special attention will be paid to sentences formalizing expressions of the form “many objects having a certain combination A of attributes have a also combination B” and, more generally, “combinations A,B of attributes are associated (dependent, correlated etc.) in a precisely defined manner”. 2 To show how suitable observational sentences are related to statistical hypothesis testing (this aspect appears sometimes unjustly neglected in data mining). A general pattern of statistical inference will be presented in logical terms (theoretical calculi and their relation to observational ones). In particular, statistical meaning of two variants of “associational rules” as well as of some “symmetric associations” will be explained. 3 To present short history of the GUHA method of automatic generation of hypotheses (General Unary Hypothesis Automaton). It is an original Czech method of exploratory data analysis, one of the oldest approaches to KDD or data mining. Its principle was formulated long before the advent of data mining. Its theoretical foundations (as presented in [5] and later publications) lead to the theory described in points (1), (2) above. 4 Finally, to show how modern fuzzy logic (in the narrow sense of the word, i.e. particular many-valued symbolic logic) may enter the domain of KDD and fruitfully generalize the field. This fourth point will concentrate to open problems and research directions. The tutorial will be complemented by demonstrations of two recent implementations of the GUHA method.

Full Text