Abstract

Spark is a low-latency, distributed computing system for large data sets. Spark is compatible with Hadoop data sources, but about 100 times faster than MapReducer, and is particularly well suited for machine learning. Spark is still in the embryonic stage, not yet high-speed development, with the Spark1.0.0 version of the release, marking the apache's top open source project Spark as a large data upstart, more and more attention by the IT industry will be widely used. Equipped with Spark platform and application of Spark to study the analysis of online video streaming media quality factors. This paper introduces the background knowledge of the research, and introduces and studies the composition and principle of Spark in detail. According to the needs of the experiment, the overall configuration of the platform is completed, its performance is verified, and its machine learning library is studied. First, we introduce the user requirements and architecture models of the widely accepted distributed file system in the industry. Then, the architecture of RDD is introduced. Finally, the relationship between the time of viewing video and the number of buffers is analyzed by KMeans machine learning algorithm, and the relationship of streaming media related factors is summarized. The platform used in the experiment is the Linux Ubuntu12.04LTS , the application is the Apache Spark platform. All the system preparation, debugging and testing are carried out in this experimental platform.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call