Dealing with redundancy is one of the main challenges in frequency based data mining and itemset mining in particular. To tackle this issue in the most objective possible way, we introduce the theoretical bases of a new probabilistic concept: Mutual constrained independence (MCI). Thanks to this notion, we describe a MCI model for the frequencies of all itemsets which is the least binding in terms of model hypotheses defined by the knowledge of the frequencies of some of the itemsets. We provide a method for computing MCI models based on algebraic geometry.We establish the link between MCI models and a class of MaxEnt models which has already known to be used in pattern mining. As such, our research presents further insight on the nature of such models and an entirely novel approach for computing them.
Read full abstract