Abstract

The combination of higher education and big data technology is not only the focus of the application of big data technology, but also an emerging field of self-development in the field of higher education. This article is dedicated to building a big data processing platform through Hadoop big data storage architecture, Hive, flume data collection technology, and Sqoop data synchronization technology to achieve efficient processing of big data sets. The traditional data mining algorithm is implemented using Map Reduce programming, and the implementation of the data mining algorithm of the Hadoop platform is studied, mainly to analyze its execution efficiency and scalability. We select the data clustering task in data mining as a representative, and write its Map Reduce version to test and verify its effect on the Hadoop platform. Through comparative experiments of different cluster sizes and different data sizes, it is concluded that the use of Hadoop distributed systems for data mining tasks has a good acceleration ratio and efficiency, and the extended performance analysis of computing power also shows that it has great potential.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call