TopPI: An efficient algorithm for item-centric mining

V Leroy,M Kirchgessner,A Termier,S Amer-Yahia

doi:10.1016/j.is.2016.09.001

Abstract

In this paper, we introduce item-centric mining, a new semantics for mining long-tailed datasets. Our algorithm, TopPI, finds for each item its top-k most frequent closed itemsets. While most mining algorithms focus on the globally most frequent itemsets, TopPI guarantees that each item is represented in the results, regardless of its frequency in the database.TopPI allows users to efficiently explore Web data, answering questions such as “what are the k most common sets of songs downloaded together with the ones of my favorite artist?”. When processing retail data consisting of 55 million supermarket receipts, TopPI finds the itemset “milk, puff pastry” that appears 10,315 times, but also “frangipane, puff pastry” and “nori seaweed, wasabi, sushi rice” that occur only 1120 and 163 times, respectively. Our experiments with analysts from the marketing department of our retail partner demonstrate that item-centric mining discover valuable itemsets. We also show that TopPI can serve as a building-block to approximate complex itemset ranking measures such as the p-value.Thanks to efficient enumeration and pruning strategies, TopPI avoids the search space explosion induced by mining low support itemsets. We show how TopPI can be parallelized on multi-cores and distributed on Hadoop clusters. Our experiments on datasets with different characteristics show the superiority of TopPI when compared to standard top-k solutions, and to Parallel FP-Growth, its closest competitor.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Information Systems	Publication Date: Sep 21, 2016
Citations: 9	License type: other-oa

R Discovery Prime

R Discovery Prime

TopPI: An efficient algorithm for item-centric mining

Abstract

Talk to us

Similar Papers

More From: Information Systems

Lead the way for us

Similar Papers

TopPI: An Efficient Algorithm for Item-Centric Mining
Martin Kirchgessner ... Sihem Amer-Yahia
-
Martin Kirchgessner, et. al.Martin Kirchgessner ... Sihem Amer-Yahia
01 Jan 2015
01 Jan 2015

An Effective Algorithm for Mining Positive and Negative Association Rules
Honglei Zhu ... Zhigang Xu
-
Honglei Zhu, et. al.Honglei Zhu ... Zhigang Xu
01 Jan 2008
01 Jan 2008

Mining Frequent Sequences Using Itemset-Based Extension
Ma Zhixin ... Tharam S Dillon
-
Ma Zhixin, et. al.Ma Zhixin ... Tharam S Dillon
01 Jan 2013
01 Jan 2013

A fast algorithm for mining high average-utility itemsets
Jerry Chun-Wei Lin ... Bay Vo
Applied Intelligence | VOL. 47
Jerry Chun-Wei Lin, et. al.Jerry Chun-Wei Lin ... Bay Vo
11 Mar 2017
Applied Intelligence | VOL. 47

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

TopPI: An efficient algorithm for item-centric mining

Abstract

Talk to us

Similar Papers

More From: Information Systems