A novel clustering technique for efficient clustering of big data in Hadoop Ecosystem

Sunil Kumar,Maninder Singh

doi:10.26599/bdma.2018.9020037

Abstract

Big data analytics and data mining are techniques used to analyze data and to extract hidden information. Traditional approaches to analysis and extraction do not work well for big data because this data is complex and of very high volume. A major data mining technique known as data clustering groups the data into clusters and makes it easy to extract information from these clusters. However, existing clustering algorithms, such as k-means and hierarchical, are not efficient as the quality of the clusters they produce is compromised. Therefore, there is a need to design an efficient and highly scalable clustering algorithm. In this paper, we put forward a new clustering algorithm called hybrid clustering in order to overcome the disadvantages of existing clustering algorithms. We compare the new hybrid algorithm with existing algorithms on the bases of precision, recall, F-measure, execution time, and accuracy of results. From the experimental results, it is clear that the proposed hybrid clustering algorithm is more accurate, and has better precision, recall, and F-measure values.

Highlights

Big data is currently generating a buzz in the market and data is rapidly growing from being measured in gigabytes to terabytes, petabytes, and zetabytes[1]
We propose a new hybrid clustering technique that combines the workings of earlier clustering algorithms
To implement the proposed hybrid clustering technique in Hadoop[24], we chose a dataset of the National Climatic Data Center (NCDC), containing the world’s largest active archive of weather data[25]

Summary

Introduction

Big data is currently generating a buzz in the market and data is rapidly growing from being measured in gigabytes to terabytes, petabytes, and zetabytes[1]. Big data has such large data requirements that applications that were previously used to store and process data— Database Management System (DBMS), Relational Database Management System (RDBMS), etc.—are failing the data demand[2]. Big data includes extremely large datasets, meaning that it is not possible for commonly used software tools to manage and process that data within the required time frame[3]. We propose a new hybrid clustering technique to handle big data

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Big Data Mining and Analytics	Publication Date: Dec 1, 2019
Citations: 30	License type: cc-by

R Discovery Prime

R Discovery Prime

A novel clustering technique for efficient clustering of big data in Hadoop Ecosystem

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Big Data Mining and Analytics

Lead the way for us

Similar Papers

A Comprehensive Study of Clustering Algorithms for Big Data Mining with MapReduce Capability
Kamlesh Kumar Pandey ... Ram Milan
-
Kamlesh Kumar Pandey, et. al.Kamlesh Kumar Pandey ... Ram Milan
01 Jan 2020
01 Jan 2020

Challenges of Big Data to Big Data Mining with their Processing Framework
Kamlesh Kumar Pandey ... Diwakar Shukla
-
Kamlesh Kumar Pandey, et. al.Kamlesh Kumar Pandey ... Diwakar Shukla
01 Nov 2018
01 Nov 2018

ADOFL: Multi-Kernel-Based Adaptive Directive Operative Fractional Lion Optimisation Algorithm for Data Clustering
Satish Chander ... P Vijaya
Journal of Intelligent Systems | VOL. 27
Satish Chander, et. al.Satish Chander ... P Vijaya
26 Jul 2018
Journal of Intelligent Systems | VOL. 27

Big Data Mining and Artificial Intelligence Based Classification Algorithm
Yuan Yuan
-
Yuan YuanYuan Yuan
01 Jan 2020
01 Jan 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A novel clustering technique for efficient clustering of big data in Hadoop Ecosystem

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Big Data Mining and Analytics