An Index for the Data Size to Extract Decomposable Structures in LAD

Hirotaka Ono,Toshihide Ibaraki,Mut Unori Yagiura

doi:10.1007/3-540-45678-3_25

Abstract

Logical analysis of data (LAD) is one of the methodologies for extracting knowledge as a Boolean function f from a given pair of data sets (T,F) on attributes set S of size n, in which T (resp., F) ⊆ {0, 1}n denotes a set of positive (resp., negative) examples for the phenomenon under consideration. In this paper, we consider the case in which extracted knowledge has a decomposable structure; i.e., f is described as a form f(x) = g(x[S0], h(x[S1])) for some S0, S1 ⊆ S and Boolean functions g and h, where x[I] denotes the projection of vector x on I. In order to detect meaningful decomposable structures, it is expected that the sizes |T| and |F| must be sufficiently large. In this paper, we provide an index for such indispensable number of examples, based on probabilistic analysis. Using p = |T|/(|T| + |F|) and q = |F|/(|T| + |F|), we claim that there exist many deceptive decomposable structures of (T,F) if |T| + |F| ≤ √2n-1/pq. The computational results on synthetically generated data sets show that the above index gives a good lower bound on the indispensable data size.

Full Text