Abstract

One of the main problems faced by Data Warehouse (DW) designers is fragmentation. Several studies have proposed data mining-based horizontal fragmentation methods, which focus on optimizing query response time and execution cost to make the DW more efficient. However, to the best of our knowledge, it does not exist a horizontal fragmentation technique that uses a decision tree to carry out fragmentation. Given the importance of decision trees in classification, since they allow obtaining pure partitions (subsets of tuples) in a data set using measures such as Information Gain, Gain Ratio and the Gini Index, the aim of this work is to use decision trees in the DW fragmentation. This chapter presents the analysis of different decision tree algorithms to select the best one to implement the fragmentation method. Such analysis was performed under version 3.9.4 of Weka considering four evaluation metrics (Precision, ROC Area, Recall, and F-measure) for different selected data sets using the SSB (Star Schema Benchmark). Several experiments were carried out using two attribute selection methods: Best First and Greedy Stepwise, the data sets were pre-processed using the Class Conditional Probabilities filter and it was included the analysis of two data sets (24 and 50 queries) with this filter, to know the behavior of the decision tree algorithms for each data set. Once the analysis was concluded, we can determine that for 24 queries data set the best algorithm was RandomTree since it won in two methods. On the other hand, in the data set of 50 queries, the best decision tree algorithms were LMT and RandomForest because they obtained the best performance for all methods tested. Finally, J48 was the selected algorithm when neither an attribute selection method nor the Class Probabilities filter are used. But, if only the latter is applied to the data set, the best performance is given by the LMT algorithm.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call