Abstract

The study of large dataset with velocity, variety and volume which is also known as Big data. When the dataset has limited number of clusters, low dimensions and small number of data points the existing traditional clustering algorithms can be used.. As we know this is the internet age, the data is growing very fast and existing clustering algorithms are not giving the acceptable results in terms of time complexity and spatial complexity. So there is a need to develop a new approach of applying clustering of large volume of data processing with low time and spatial complexity through MapReduce and Hadoop frame work applying to different clustering algorithms, k-means, Canopy clustering and proposed algorithm .The analysis shows that the large volume of data processing will take low time and spatial complexity when compared to small volume of data.

Highlights

  • The data is increasing in terms of volume, variety, and velocity, the existing clustering algorithm takes more time to produce the results

  • MapReduce is one of the programming designs for large volumes of datasets in parallel .MapReduce with HDFS can be used to handle the big data,which is commonly known as Hadoop .Once the file is placed into HDFS it can be read n number of times

  • The execution time of K-Mean clustering Algorithm Given by O where n is the number of data points, k is the number of clusters, i is the number of iterations needed to converge and d is the dimensions

Read more

Summary

Introduction

The data is increasing in terms of volume, variety, and velocity, the existing clustering algorithm takes more time to produce the results. To produce results in terms of less time and less memory one should think of something big and that is parallel programing. MapReduce is one of the programming designs for large volumes of datasets in parallel .MapReduce with HDFS can be used to handle the big data ,which is commonly known as Hadoop .Once the file is placed into HDFS it can be read n number of times

Map reduce
Reduce phase
Proposed system
Existing system approach
Limitations
K-mean clustering algorithm
Canopykmeans clustering algorithm
Result and analysis
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.