Classification is a popular data mining task, where the value of a discrete (dependent) variable is predicted, based on the values of several independent variables. In this research, we investigate how predictive classification models can be inferred from the available data. The classification models are required to make good predictions, and be comprehensible and intuitive. The aspect of humanly understandable and intuitive models is of crucial importance in any domain where the model needs to be validated before it can be implemented, such as in the medical diagnosis and credit scoring domain. A classification model that is accurate, comprehensible and intuitive is defined in this thesis as acceptable for implementation. Building such acceptable models is the goal of this text. We examine how rule based classifiers can be built that satisfy these requirements. In a first approach, we use rule extraction from Support Vector Machines (SVMs) to extract rules that are accurate, comprehensible, and mimic the SVM model as much as possible. Next, the use of artificial ant colonies for classification is studied, attempting to induce acceptable classification models from data. In a final part, we discuss the application of the investigated algorithms for real-life case studies, such as the prediction of defaults, going concern opinions, software faults, and business/ICT alignment.
Read full abstract