Abstract

This manuscript presents a data quality analysis and holistic ’machine learning-readiness’ evaluation of a representative set of large-scale, real-world phasor measurement unit (PMU) datasets provided under the United States Department of Energy-funded FOA 1861 research program [1]. A major focus of this study is to understand the present-day suitability of large-scale, real-world synchrophasor datasets for application of commercially-available, off-the-shelf big data and supervised or semi-supervised machine learning (ML) tools and catalogue any major obstacles to their application. To this end, dataset quality is methodically examined through an interconnect-wide quantifications of basic bad data occurrences, a summary of several harder-to-detect data quality issues that can jeopardize successful application of machine learning, and an evaluation of the adequacy of event log labeling for supervised training of models used for online event classification. A global ’six-point’ statistical analyses of several key dataset variables is demonstrated as a means by which to identify additional hard-to-detect data quality issues, also providing an example successful application of big data technology to extract insights regarding reasonable operational bounds of the US power system. Obstacles for application of commercial ML technologies are summarized, with a particular focus on supervised and semi-supervised ML. Lessons-learned are provided regarding challenges associated with present-day event labeling practices, large spatial scope of the dataset, and dataset anonymization. Finally, insight into efficacy of employed mitigation strategies are discussed, and recommendations for future work are made.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.