The dissolved gas analysis of insulating oil in power transformers can provide valuable information about fault diagnosis. Power transformer datasets are often imbalanced, worsening the performance of machine learning-based fault classifiers. A critical step is choosing the proper evaluation metric to select features, models, and oversampling techniques. However, no clear-cut, thorough guidance on that choice is available to date. In this work, we shed light on this subject by introducing new tailored evaluation metrics. Our results and discussions bring fresh insights into which learning setups are more effective for imbalanced datasets.