Abstract

Clustering and classification are two important techniques of data mining. Classification is a supervised learning problem of assigning an object to one of several pre-defined categories based upon the attributes of the object. While, clustering is an unsupervised learning problem that group objects based upon distance or similarity. Each group is known as a cluster. In this paper we make use of a large database ‘Fisher’s Iris Dataset’ containing 5 attributes and 150 instances to perform an integration of clustering and classification techniques of data mining. We compared results of simple classification technique (using J48 classifier) with the results of integration of clustering and classification technique, based upon various parameters using WEKA (Waikato Environment for Knowledge Analysis), a Data Mining tool. The results of the experiment show that integration of clustering and classification gives promising results with utmost accuracy rate and robustness even when the data set is containing missing values.

Highlights

  • Data mining is the process of automatic classification of cases based on data patterns obtained from a dataset

  • Object comparative study of data mining classification algorithm corresponds to Iris flower, and object class label corresponds to namely J48(C4.5) and an integration of Simple KMeans species of Iris flower

  • During an integration of clustering and classification techniques of data mining first, Simple KMeans clustering algorithm was implemented on the training data set by removing the class attribute from the data set as clustering technique is unsupervised learning and J48 classification algorithm was implemented on the resulting dataset

Read more

Summary

INTRODUCTION

Data mining is the process of automatic classification of cases based on data patterns obtained from a dataset. A number of algorithms have been developed and implemented to extract information and discover knowledge patterns that may be useful for decision support [2]. Decision tree classifiers are relatively fast as compared to other classification methods. Tree classifiers obtained similar and sometimes better accuracy when compared with other classification methods [11]. Clustering is the unsupervised classification of patterns into clusters [6].The community of users has played lot emphasis on developing fast algorithms for clustering large datasets [14].It groups similar objects together in a cluster (or clusters) and dissimilar objects in other cluster (or clusters) [12]. Based on the combination of the four features, Fisher developed a linear discriminant model to distinguish the species from each other

Organisation of the paper
PROPOSED METHOD
Building Classifiers
Observations and Analysis
CONCLUSION AND FUTURE WORK
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call