Abstract

Finding the most interesting correlations among items is essential for problems in many commercial, medical, and scientific domains. For example, what kinds of items should be recommended with regard to what has been purchased by a customer? How to arrange the store shelf in order to increase sales? How to partition the whole social network into several communities for successful advertising campaigns? Which set of individuals on a social network should we target to convince in order to trigger a large cascade of further adoptions? When conducting correlation analysis, traditional methods have both effectiveness and efficiency problems, which will be addressed in this dissertation. Here, we explore the effectiveness problem in three ways. First, we expand the set of desirable properties and study the property satisfaction for different correlation measures. Second, we study different techniques to adjust original correlation measure, and propose two new correlation measures: the Simplified χ with Continuity Correction and the Simplified χ with Support. Third, we study the upper and lower bounds of different measures and categorize them by the bound differences. Combining with the above three directions, we provide guidelines for users to choose the proper measure according to their situations. With the proper correlation measure, we start to solve the efficiency problem for a large dataset. Here, we propose a fully-correlated itemset (FCI) framework to decouple the correlation measure from the need for efficient search. By wrapping the desired measure in our FCI framework, we take advantage of the desired measure’s superiority in evaluating itemsets, eliminate itemsets with irrelevant items, and achieve good computational performance. In addition, we identify a 1-dimensional monotone property of the upper bound of any good correlation measure, and different 2-dimensional

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call