Mining Binary Data with Matrix Algebra

Ritu Chaturvedi,C.I Ezeife

doi:10.1109/cit/iucc/dasc/picom.2015.135

Abstract

Many applications such as intelligent tutoring system (ITS) use data that are better represented as binary data. This paper presents a novel algorithm called MBER (Mining Binary Data Efficiently by Reduced AND operations) for finding frequent itemsets in a binary dataset using matrix algebra operations. Frequent itemsets are sets of items in a transactional database that occur together frequently (defined by a user-given threshold value called minimum support). Existing algorithms that operate on binary data, such as ABBM, generate frequent itemsets by performing exhaustive AND operations using brute force method. MBER, on the other hand, generates frequent itemsets using a novel technique in which it first uses matrix algebra operations to find those transactions that have m common items in them (called as potential transactions) and then performs AND operations on only such potential transactions. This reduces the total number of AND operations required considerably (by less than a quarter) and thereby improves the efficiency of the algorithm. MBER also shows a significant improvement over traditional algorithms that generate frequent itemsets, such as Apriori, by eliminating the need to (i) scan the database more than once and (ii) to generate large number of candidate itemsets. This paper concludes by a proof of correctness of MBER and a discussion on evaluating it.

Full Text