TopPI: An Efficient Algorithm for Item-Centric Mining

Martin Kirchgessner,Sihem Amer-Yahia,Alexandre Termier,Vincent Leroy,Marie-Christine Rousset

doi:10.1007/978-3-319-43946-4_2

Abstract

We introduce TopPI, a new semantics and algorithm designed to mine long-tailed datasets. For each item, and regardless of its frequency, TopPI finds the k most frequent closed itemsets that item belongs to. For example, in our retail dataset, TopPI finds the itemset nori seaweed, wasabi, sushi rice, soy sauce that occurrs in only 133 store receipts out of 290 million. It also finds the itemset milk, puff pastry , that appears 152,991 times. Thanks to a dynamic threshold adjustment and an adequate pruning strategy, TopPI efficiently traverses the relevant parts of the search space and can be parallelized on multi-cores. Our experiments on datasets with different characteristics show the high performance of TopPI and its superiority when compared to state-of-the-art mining algorithms. We show experimentally on real datasets that TopPI allows the analyst to explore and discover valuable itemsets.

Full Text