Automatic Clustering Using Teaching Learning Based Optimization

M Ramakrishna Murty,Anima Naik,K Parvathi,J V R Murthy,P V G D Prasad Reddy,Suresh C Satapathy

doi:10.4236/am.2014.58111

Abstract

Finding the optimal number of clusters has remained to be a challenging problem in data mining research community. Several approaches have been suggested which include evolutionary computation techniques like genetic algorithm, particle swarm optimization, differential evolution etc. for addressing this issue. Many variants of the hybridization of these approaches also have been tried by researchers. However, the number of optimal clusters and the computational efficiency has still remained open for further research. In this paper, a new optimization technique known as “Teaching-Learning-Based Optimization” (TLBO) is implemented for automatic clustering of large unlabeled data sets. In contrast to most of the existing clustering techniques, the proposed algorithm requires no prior knowledge of the data to be classified rather it determines the optimal number of partitions of the data “on the run”. The new AUTO-TLBO algorithms are evaluated on benchmark datasets (collected from UCI machine repository) and performance comparisons are made with some well-known clustering algorithms. Results show that AUTO-TLBO clustering techniques have much potential in terms of comparative results and time of computations.

Highlights

Clustering technique enables one to partition unlabeled data set into groups of similar objects known as clusters
We compare performance of the AUTO-Teaching-Learning-Based Optimization” (TLBO) algorithm with Automatic clustering using improved differential evolution (ACDE) [4], standard hierarchical agglomerative clustering based on the linkage metric of average link [15], the genetic algorithm clustering with an unknown number of clusters K (GCUK) [16], dynamic clustering PSO (DCPSO) [17] and an ordinary classical DE-based clustering method
While comparing the performance of AUTO-TLBO algorithm with other clustering techniques, we focus on two major issues: as 1) ability to find the optimal number of clusters; and 2) computational time required to find the solution

Summary

Introduction

Clustering technique enables one to partition unlabeled data set into groups of similar objects known as clusters. Each cluster is clearly different from other clusters. Evolutionary computation techniques are widely used by researchers to evolve clusters in the complex data sets. There is no adequate research progress to determine the optimal number of clusters [1]. Clustering techniques based on evolutionary computations, mainly take the number of classes K as input instead of determining the same during the execution process. In most of the cases, determining the appropriate number of clusters in real time situation is difficult

Methods

Results

Conclusion