Abstract

With the rapid development of information technology and internet, all kinds of industry data exploded causing difficult to analyze and mine useful information from big data. Traditional analysis system has bottlenecks of performance and scalability in big data processing. The research and development of novel and efficient big data analysis and mining platform has become the focus of all organizations. Along with the development of smart grid, power data with characteristics of power industry needs more targeted and efficient data mining analysis. In this paper, aiming at the shortage of existing work, we propose a distributed big data mining platform based on distributed system infrastructure such as Hadoop and Spark. The platform develops and implements a variety of rapid highly parallel mining algorithm by Spark and Tensorflow, including machine learning, statistics and analysis, deep learning and so on. Using the OSGI technology to build low coupling component model, the platform improve reusability of component algorithm, introduces the workflow engine and user-friendly GUI, reduces the complexity of the user operations, support user-defined data mining tasks. For the characteristics of smart grid big data, the platform develops and improves the dozens of algorithm components about data processing and analysis. And designing a scalable algorithms library and the component library greatly improves the scalability of big data mining platform and processing smart grid data. Our platform has already been launched in a state grid Company, satisfying the demand of various smart grid data analysis business.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call