Abstract

In the development of information technology the development of scientific theory has brought the progress of science and technology. The progress of science and technology has an impact on the educational field, which changes the way of education. The arrival of the era of big data for the promotion and dissemination of educational resources has played an important role, it makes more and more people benefit. Modern distance education relies on the background of big data and cloud computing, which is composed of a series of tools to support a variety of teaching mode. Clustering algorithm can provide an effective evaluation method for students' personality characteristics and learning status in distance education. However, the traditional K-means clustering algorithm has the characteristics of randomness, uncertainty, high time complexity, and it does not meet the requirements of large data processing. In this paper, we study the parallel K-means clustering algorithm based on cloud computing platform Hadoop, and give the design and strategy of the algorithm. Then, we carry out experiments on several different sizes of data sets, and compare the performance of the proposed method with the general clustering method. Experimental results show that the proposed algorithm which is accelerated has good speed up and low cost. It is suitable for the analysis and mining of large data in the distance higher education.

Highlights

  • In the development of information technology the development of scientific theory has brought the progress of science and technology

  • Because the traditional K-means clustering algorithm has the characteristics of randomness, uncertainty, high time complexity, and it does not meet the requirements of large data processing

  • This paper is organized as follows: in section 2, we introduce the basic knowledge of Hadoop architecture

Read more

Summary

INTRODUCTION

In the development of information technology the development of scientific theory has brought the progress of science and technology. Modern distance education teaching system is composed of a series of teaching tools to support a variety of teaching mode, including learning system, teaching system, teaching resources editing system, counseling and answering system, examination system, evaluation system, communication tools, virtual experiment system and search engine, etc. Clustering algorithm in machine learning is applied to image segmentation and machine vision It is used for data compression and information retrieval in image processing. In order to realize the clustering analysis of big data, some scholars had realized the parallel clustering algorithm based on the distributed programming model. A parallel K-means clustering algorithm based on cloud computing platform Hadoop is studied, and the method and strategy of the algorithm are given.

HADOOP
Clustering criterion
The type of clustering
Serial K-mean algorithm
Parallel K-mean algorithm
EXPERIMENT AND RESULT ANALYSIS
CONCLUSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call