Abstract

Abstract With balanced training sample (TS) data, learning algorithms offer good results in lithology classification. Meanwhile, unprecedented lithological mapping in remote places is predicted to be difficult, resulting in limited and unbalanced samples. To address this issue, we can use a variety of techniques, including ensemble learning (such as random forest [RF]), over/undersampling, class weight tuning, and hybrid approaches. This work investigates and analyses many strategies for dealing with imbalanced data in lithological classification based on RF algorithms with limited drill log samples using remote sensing and airborne geophysical data. The research was carried out at Komopa, Paniai District, Papua Province, Indonesia. The class weight tuning, oversampling, and balance class weight procedures were used, with TSs ranging from 25 to 500. The oversampling approach outperformed the class weight tuning and balance class weight procedures in general, with the following metric values: 0.70–0.80 (testing accuracy), 0.43–0.56 (F1 score), and 0.32–0.59 (Kappa score). The visual comparison also revealed that the oversampling strategy gave the most reliable classifications: if the imbalance ratio is proportionate to the coverage area in each lithology class, the classifier capability is optimal.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call