This study utilizes satellite data to detect undocumented oil and gas wells, which pose significant environmental concerns, including greenhouse gas emissions. Three key findings emerge from the study. Firstly, the problem of imbalanced data is addressed by recommending oversampling techniques like Rotation–GaussianBlur–Solarization data augmentation (RGS), the Synthetic Minority Over-Sampling Technique (SMOTE), or ADASYN (an extension of SMOTE) over undersampling techniques. The performance of borderline SMOTE is less effective than that of the rest of the oversampling techniques, as its performance relies heavily on the quality and distribution of data near the decision boundary. Secondly, incorporating pre-trained models trained on large-scale datasets enhances the models’ generalization ability, with models trained on one county’s dataset demonstrating high overall accuracy, recall, and F1 scores that can be extended to other areas. This transferability of models allows for wider application. Lastly, including persistent homology (PH) as an additional input improves performance for in-distribution testing but may affect the model’s generalization for out-of-distribution testing. A careful consideration of PH’s impact on overall performance and generalizability is recommended. Overall, this study provides a robust approach to identifying undocumented oil and gas wells, contributing to the acceleration of a net-zero economy and supporting environmental sustainability efforts.
Read full abstract