This article describes the entry of the Super Computer Data Mining (SCDM) Project to the 10th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD) 2006 Data Mining Competition. The SCDM project is developing data mining tools for parallel execution on Linux clusters. The code is freely available; please contact the first author for a copy. We combine several classifiers, some of them ensemble techniques, into a heterogeneous meta-ensemble, to produce a probability estimate for each test case. We then use a simple decision theoretic framework to form a classification. The meta-ensemble contains a Bayesian neural network, a learning classifier system (LCS), attribute selection based-ensemble algorithms (Filtered At-tribute Subspace based Bagging with Injected Randomness [FASBIR]), and more well-known classifiers such as logistic regression, Naive Bayes (NB), and C4.5.
Read full abstract