Abstract

Recent years have witnessed an increasing interest in computing cosine similarity between high-dimensional documents, transactions, and gene sequences, etc. Most previous studies limited their scope to the pairs of items, which cannot be adapted to the multi-itemset cases. Therefore, from a frequent pattern mining perspective, there exists still a critical need for discovering interesting patterns whose cosine similarity values are above some given thresholds. However, the knottiest point of this problem is, the cosine similarity has no anti-monotone property. To meet this challenge, we propose the notions of conditional anti-monotone property and Support-Ascending Set Enumeration Tree ( SA-SET). We prove that the cosine similarity has the conditional anti-monotone property and therefore can be used for the interesting pattern mining if the itemset traversal sequence is defined by the SA-SET. We also identify the anti-monotone property of an upper bound of the cosine similarity, which can be used in further pruning the candidate itemsets. An Apriori-like algorithm called CosMiner is then put forward to mine the cosine interesting patterns from large-scale multi-item databases. Experimental results show that CosMiner can efficiently identify interesting patterns using the conditional anti-monotone property of the cosine similarity and the anti-monotone property of its upper bound, even at extremely low levels of support.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call