Abstract

The learning approach is basically used for classification of different data into clusters. Basically, semi supervised learning has been used worldwide to classify the labelled as well as unlabeled data. The dataset sometimes may be in mixed features that may consist of both numeric and categorical type of data. In these two types, data may differ in their characteristics. Due to the differences in their characteristics, in order to group these types of mixed data, it is better to use the ensemble clustering method which uses split and merge approach to solve this problem. This research work carried out the original mixed dataset and is categorised into numeric dataset and categorical dataset and clustered using both traditional clustering algorithms and fuzzy clustering algorithms using random subspace approach called as fuzzy random forest (FRF). The resultant clusters are combined using ensemble clustering methods and evaluated by both f-measure and entropy measure. It is found that splitting is more beneficial and applying fuzzy clustering algorithms provides better results than traditional clustering algorithms. The system was tested on Hadoop multi node cluster environment as well as traditional environment. The hybrid genetic algorithm is used for optimisation.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call