An Integrated Framework for Mixed Data Clustering Using Self Organizing Map

Devaraj Devaraj

doi:10.3844/jcssp.2011.1639.1645

Abstract

Problem statement: Clustering plays an important role in data mining of large data and helps in analysis. This develops a vast importance in research field for providing better clustering technique. There are several techniques exists for clustering the similar kind of data. But only very few techniques exists for clustering mixed data items. This leads to the requirement of better clustering technique for classification of mixed data. The cluster must be such that the similarity of items within the clusters is increased and the similarity of items from different clusters must be reduced. The existing techniques possess several advantages and at the same time various disadvantages also exists. Approach: To overcome those drawbacks, Self-Organizing Map (SOM) and Extended Attribute-Oriented Induction (EAOI) for clustering mixed data type data can be used. This will take more time for clustering. A modified SOM was proposed based on batch learning. Results: The experimentation for the proposed technique was carried with the help of UCI Adult Data Set. The number of clusters resulted for the proposed technique is lesser when compared to the usage of SOM. Also the outliers were not obtained by using the proposed technique. Conclusion: The experimental suggests that the proposed technique can be used to cluster the mixed data items with better accuracy of classification.

Highlights

The intention of analysis before data preprocessing is to achieve close knowledge into the data possibilitiesOne of the widely used techniques in data mining and troubles to find whether the data are enough.(Wang et al, 2010) is clustering
The basic process in data mining
Every training pattern contains the units of the map and finds the Better Matching Unit (BMU) that is highly identical to the training model

Summary

Introduction

(Wang et al, 2010) is clustering Most techniques of clustering comprise document grouping, scientific data analysis and customer/market segmentation. A basis data mining technique used is data clustering. The clustering with the help of Gaussian mixture models is widely used. The six sequential, iterative steps of Data mining processes are: 2010) (Syurahbil et al, 2009) is the construction of clusters or homogenous category by dividing the set of objects in the databases. It is highly useful in various purposes like classification aggregation and segmentation or dissection

Methods

Results

Conclusion