Mining traffic accident data by subgroup discovery using combinatorial targets

Jeongmin Kim Jeongmin Kim,Kwang Ryel Ryu Kwang Ryel Ryu

doi:10.1109/aiccsa.2015.7507171

Abstract

We use a subgroup discovery algorithm to discover useful knowledge from a traffic accident dataset consisting of many features. Unlike classification learning, subgroup discovery pursues rules of not the accuracy but the generality and unusualness. Depending on what aspect of the data we are focusing our attention to, we may combine different multiple relevant features of interest to make a synthetic target feature, and give it to the subgroup discovery algorithm. After a set of rules is derived, some postprocessing steps are taken to make the ruleset more compact and easier to understand. Experiments with the traffic accident data reveals that the subgroup discovery algorithm should be something that can handle skewed data well. Since a combinatorial target will necessarily make the distribution of the target concept skewed, algorithms that try to cover not only the target but all the examples in the data have problem in deriving good rules for the targeted subgroups.

Full Text