Computational Complexity and ILP Models for Pattern Problems in the Logical Analysis of Data

Giuseppe Lancia,Paolo Serafini

doi:10.3390/a14080235

Abstract

Logical Analysis of Data is a procedure aimed at identifying relevant features in data sets with both positive and negative samples. The goal is to build Boolean formulas, represented by strings over {0,1,-} called patterns, which can be used to classify new samples as positive or negative. Since a data set can be explained in alternative ways, many computational problems arise related to the choice of a particular set of patterns. In this paper we study the computational complexity of several of these pattern problems (showing that they are, in general, computationally hard) and we propose some integer programming models that appear to be effective. We describe an ILP model for finding the minimum-size set of patterns explaining a given set of samples and another one for the problem of determining whether two sets of patterns are equivalent, i.e., they explain exactly the same samples. We base our first model on a polynomial procedure that computes all patterns compatible with a given set of samples. Computational experiments substantiate the effectiveness of our models on fairly large instances. Finally, we conjecture that the existence of an effective ILP model for finding a minimum-size set of patterns equivalent to a given set of patterns is unlikely, due to the problem being NP-hard and co-NP-hard at the same time.

Highlights

One of the main consequences of the constant progress of technology together with the massive use of computers in many aspects of our lives has been the creation of large repositories of data storing information of all sorts
In this paper we focus on some mathematical issues that arise from data mining problems
A very common situation for data mining problems is to represent the starting information by a two-dimensional array, in which the rows correspond to samples while the columns correspond to their characteristics

Summary

Introduction

One of the main consequences of the constant progress of technology together with the massive use of computers in many aspects of our lives has been the creation of large repositories of data storing information of all sorts. Finding a min-size set of patterns which cover a given set of vectors is called the Pattern Cover Minimality problem. Other problems arising from the analysis of patterns are related to understanding whether two different sets of rules explain the same data set, or, in other words, the two pattern sets are equivalent. In particular we would like to know whether a given set of rules explains all possible data, and so is in some sense “useless”. Given a set of patterns we would like to know whether there exists another smaller set of patterns that explains the same data set

Basic Definitions

Computational Complexity Results

Compatible Patterns

ILP Models

ILP for Pattern Cover Minimality

ILP for Pattern Equivalence

Computational Experiments

Pattern Cover Minimality

Pattern Equivalence

Diagonal Instances

Generating Equivalent Pattern Sets in General

How to Boost Instances of Pattern Equivalence

Experiments

Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Algorithms	Publication Date: Aug 9, 2021
Citations: 6	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Computational Complexity and ILP Models for Pattern Problems in the Logical Analysis of Data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Algorithms

Lead the way for us

Similar Papers

The Complexity of Some Pattern Problems in the Logical Analysis of Large Genomic Data Sets
Giuseppe Lancia ... Paolo Serafini
-
Giuseppe Lancia, et. al.Giuseppe Lancia ... Paolo Serafini
01 Jan 2015
01 Jan 2015

Pareto-optimal patterns in logical analysis of data
Peter L Hammer ... Sándor Szedmák
Discrete Applied Mathematics | VOL. 144
Peter L Hammer, et. al.Peter L Hammer ... Sándor Szedmák
08 Aug 2004
Discrete Applied Mathematics | VOL. 144

Comparative study on logical analysis of data (LAD), artificial neural networks (ANN), and proportional hazards model (PHM) for maintenance prognostics
Hanna Lo ... John Newhook
Journal of Quality in Maintenance Engineering | VOL. 25
Hanna Lo, et. al.Hanna Lo ... John Newhook
24 Jan 2019
Journal of Quality in Maintenance Engineering | VOL. 25

Pattern selection approaches for the logical analysis of data considering the outliers and the coverage of a pattern
Jeong Han ... Myong K Jeong
Expert Systems with Applications | VOL. 38
Jeong Han, et. al.Jeong Han ... Myong K Jeong
01 May 2011
Expert Systems with Applications | VOL. 38

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Computational Complexity and ILP Models for Pattern Problems in the Logical Analysis of Data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Algorithms