Abstract
Data about behind-the-meter photovoltaics (PV) installations may be difficult to obtain for researchers. A number of investigators have considered deep learning as an attractive solution to this challenge, capable of directly identifying PV installations from aerial or satellite images. Deep learning models are well known to experience challenges when working with data from sources that they have never been exposed to. This study investigated whether generalizability can be improved by diversifying training data across available labeled data sources. We assessed the performance of models trained on all possible combinations of six different labeled datasets of aerial PV imagery, with a fixed number of total training images. Unfortunately, our results indicate that no combination of model training data achieved generalized performance that approaches models trained on data from a target data source. This implies that generalized ResNet models cannot be developed simply by modifying the configuration of the training data. Consequently, researchers should expect that some degree of data labeling is likely to be necessary when adapting these models to new applications, but our results do indicate that significant performance improvements are possible with only small (∼20%) introductions of target data. Future work may investigate alternative architectures, expanded training datasets, or ways to reduce the amount of labeled data necessary to adapt a model for a given application.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have