Abstract

In this paper, K-means algorithm has been applied for distributed large data using hybrid clustering techniques. K-means is a simple and scalable algorithm which can be applied on large datasets. It is one of the well-known unsupervised clustering algorithms that fail in providing structured to unstructured data to enable extraction of valuable information. Peer-to-peer (P2P) technologies divide the data or resources between the peers for managing the network bandwidth, network participants and processing powers. During the data distribution process in the P2P environments, accuracy, computation complexity and distributed clustering accuracy are the important issues as they reduce the entire system performance. So, the author in this paper considered the system for the distribution of data in P2P environment using mining techniques. The data have been distributed using the hybrid map reducing method which analyzes the large volume of data by performing filtering and sorting. The cluster approach analyzes and manages the neighboring relationship about the peer nodes that helps in the management of the cluster distribution in the dynamic environment. Determination of the efficiency of the cluster formed is done with the help of the hybrid clustering algorithm, and the related system architecture is proposed. The clustering efficiency has been enhanced in the P2P environment using the distributed data network. The efficiency of the formed cluster was evaluated in terms of Jaccard index, F-measures, mutual information and rand measure. The performance of the system was analyzed using the experimental results and discussions, namely, error rate, accuracy and time. The multi-objective system helps in easing the difficulties in the implementation of P2P environment sensitive to initial solutions.

Highlights

  • In recent days, peer-to-peer (P2P) is one of the most common technologies for processing the different types of data in the distributed environment

  • The peer-to-peer-based clustering process includes the characteristic ability to be scalable in the peer-to-peer technology, ability to perform the routerless network and willingness to perform the functions despite any changes in the node or peer

  • The K-means clustering algorithm (Chen and Ho 2006) shares the data by exchanging the message between the peers, thereby reducing the problems seen in the normal clustering process

Read more

Summary

Motivation

The data analysis process is performed with the help of the data mining which analyzes the data and clusters similar data for making the efficient distribution (Nghiem et al 2014). The peer-to-peer-based clustering process includes the characteristic ability to be scalable in the peer-to-peer technology, ability to perform the routerless network and willingness to perform the functions despite any changes in the node or peer. By using these characteristics, the similar data present in the network have been estimated using the neighborhood relationship clustered together. The data mining process computes the data in the dataset in terms of using exact local algorithm and approximate local algorithm. The K-means clustering algorithm (Chen and Ho 2006) shares the data by exchanging the message between the peers, thereby reducing the problems seen in the normal clustering process

Methodology
Problem statement
Related works
Objectives
Proposed system
Clustering the selected data
Functions of K-means
Estimating the accuracy of cluster using harmonic search
Performance analysis
Skin segmentation dataset
Adult dataset
Jaccard index
F-measure
Rand measure
Methods
Conclusion
Compliance with ethical standards
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call