New algorithm for clustering unlabeled big data

Marwan B Mohammed,Wafaa Al-Hameed

doi:10.11591/ijeecs.v24.i2.pp1054-1062

Marwan B Mohammed, Wafaa Al-Hameed

Open Access

https://doi.org/10.11591/ijeecs.v24.i2.pp1054-1062

Copy DOI

Abstract

The clustering analysis techniques play an important role in the area of data mining. Although from existence several clustering techniques. However, it still to their tries to improve the clustering process efficiently or propose new techniques seeks to allocate objects into clusters so that two objects in the same cluster are more similar than two objects in different clusters and careful not to duplicate the same objects in different groups with the ability to cover all data as much as possible. This paper presents two directions. The first is to propose a new algorithm that coined a name (MB Algorithm) to collect unlabeled data and put them into appropriate groups. The second is the creation of a lexical sequence sentence (LCS) based on similar semantic sentences which are different from the traditional lexical word chain (LCW) based on words. The results showed that the performance of the MB algorithm has generally outperformed the two algorithms the hierarchical clustering algorithm and the K-mean algorithm.

Full Text