Research on big data analysis and processing system based on Spark platform

Yansong Li

doi:10.1109/mlise57402.2022.00059

Abstract

This article design and implement a big data analysis and processing system based on a distributed platform, based on the Spark platform to process large-scale time series data. The system framework is mainly divided into storage layer, operator layer and algorithm layer. At the storage layer, the system organizes and indexes large-scale time series data based on HDFS and Hive. At the operator layer, the system provides users with basic operations commonly used in time series data on the Spark platform, and allows users to directly use these operators to implement custom time series related processing algorithms. At the algorithm layer, the system implements some commonly used time series analysis algorithms in the Spark platform, including time series similarity query, clustering, and forecasting. Users can directly use these algorithms for time series analysis. The feasibility and practicability of the system are verified by testing the system performance and function.

Full Text