Abstract

Applications of big data techniques in power system will make contributions to the sustainable development and robust establishment of China Southern Power Grid; thus, it is necessary that a new framework of China Southern Power Grid big data platform is constructed. Apart from key technologies, like data analysis, data process, and data visualization, the integration and fusion problem in the data warehouse plays an important role in the data analysis and mining with high quality. In order to minimize the operation time and memory consumption, various scheduling strategies of extract–transform–load workflows are proposed, including round-robin algorithm, minimum-cost algorithm, minimum-memory algorithm, and mixture of the minimum-cost and minimum-memory algorithm. In combination with above algorithms, a workflow is divided into many subflows by effective algorithms, like shortest-subflow-first and priority-backfilling algorithms, which can further improve the parallel computation ability. Then, the minimum-cost and minimum-memory with shortest-subflow-first algorithm, the minimum-cost and minimum-memory with priority-backfilling algorithm, and the minimum-cost and minimum-memory with shortest-subflow-first and priority-backfilling algorithm are established, which are designed to schedule subflows. Finally, aiming at characteristics of China Southern Power Grid big data, different performance indexes are cited to evaluate above algorithms, and the experiment results show that the minimum-cost and minimum-memory with shortest-subflow-first and priority-backfilling algorithm is superior to the hybrid prioritization algorithm based on the rank level of each task (hybrid), online workflow management, minimum-cost and minimum-memory with shortest-subflow-first, and the minimum-cost and minimum-memory with priority-backfilling algorithm, and the system robust performance is also significantly met and improved.

Highlights

  • Sources of big data mainly come from energy management system (EMS), distribution management system (DMS), automatic measurement system (AMS), marketing management system (MMS), customer service system (CCS), geographic information system (GIS), weather prediction system (WPS), and social economy data (SED)

  • The scheduling length ratio (SLR) is the ratio of a workflow operation time over its best possible scheduling length, which is designed to evaluate the performance of scheduling algorithms without the workflow size variation and defined as the value of dividing the operation time and critical path length (CPL) of a workflow

  • A new framework of a big data platform in China Southern Power Grid (CSG) is built, of which the integration and fusion play a key role in the system performance

Read more

Summary

Introduction

With the global energy problem becoming more and more serious, all countries in the world do research on the smart grid.[1,2,3] The ultimate goal of the smart grid builds a comprehensive power system covering the whole production process, including power generation, power transmission, power transformation, power distribution, power dispatch, and power consumption, 1CSG Electric Power Research Institute, China Southern Power Grid, Guangzhou, China 2Guangzhou Kit Information Technology Co., Ltd., Guangzhou, China. A scheduling framework is put forward in the DW, even though the static scheduling, dynamic scheduling, and same layer division are carried out, and the accurate scheduling model and overall algorithm description are left out.[19] A greedy algorithm is applied to the optimal workflow scheduling, which is limited to only a workflow and cannot guarantee the performance of the multi- workflow condition.[20] The derivation mode of the primary table is proposed to optimize the ETL process, and a pipeline optimization method is provided for the ETL operation, which is based on the premise that all ETL activities are constrained serially and lack the generality to some extent.[21] A possible physical implementation of an ETL workflow is put up, including logical-level description and an appropriate cost model as inputs, but which neglects the workflow operation in detail.[22] In order to search for alternative physical implementations with lower cost, this algorithm is extended by intentionally introducing sorting activities in the workflow, but comparative experiments are not shown, including workflow styles and the algorithm itself.[23] These drawbacks further motivate the improvement of ETL workflow schedulers, and efficient ETL operations have become a research topic to achieve the minimum of the ETL operation time and memory consumption. Each other, comparison of different algorithms, and robustness performance evaluation

Background
Return MC next
14 If m 0 then
D2 D3 Subtotal D1 D2 D3 Subtotal D1 D2 D3 Subtotal
27 Return ERT
Experimental results and analysis
Evaluation of proposed algorithms
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.