Abstract

The ease of collecting data about customers through the Internet has facilitated the process of developing large repositories of data. These data can and do contain patterns that are useful for the decision maker. Knowledge discovery and data mining methods have been widely used to extract these patterns. It is acknowledged that about 80% of the resources in a majority of data mining applications are spent on cleaning and preprocessing the data. However, there have been relatively few studies on preprocessing data used as input in these data mining systems. In this study, we present a feature selection method based on the Hausdorff distance measure, and evaluate its effectiveness in preprocessing input data for inducing decision trees. Message traffic data from a Web site are used to illustrate performance of the proposed method.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call