Abstract

As data is increasing with every single day and traditional database systems such as DBMS and RDBMS are facing a hard time to manage terabytes to petabytes of data, Bigdata comes to our savior. With Bigdata techniques not only store huge amount of data but can also process real time data. Once such huge amount of data is available, one can use various data mining techniques to gain insights from data. Clustering techniques helps us to extract more information by dividing the main data set into small cluster or group of clusters, making it easy for the algorithm to understand the data and provide more relevant results. K-means and hierarchical clustering are popular clustering algorithms. Big Data provide number of frameworks to take benefit of data mining techniques such as Map Reduce and Spark. In this paper, focusing on analyzing the performance analysis of K-means and hierarchical clustering algorithms using Spark and concluding which clustering algorithms are performing much better. The findings of the experiment shows that when employing a large dataset framework, K-Means and Hierarchical clustering provide promising results and performance, demonstrating Spark's capacity to cooperate with machine learning algorithms.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.