Unmitigated disruptions pose a much more serious threat when large-scale tokamaks are operating in the high performance regime. Machine learning based disruption predictors can exhibit impressive performance. However, their effectiveness is based on a substantial amount of training data. In future reactors, obtaining a substantial amount of disruption data in high performance regimes without risking damage to the machine is highly improbable. Using machine learning to develop disruption predictors on data from the low performance regime and transfer them to the high performance regime is an effective solution for a large reactor-sized tokamak like ITER and beyond. In this study, a number of models are trained using different subsets of data from the HL-2A tokamak experiment. A SHapley Additive exPlanations (SHAP) analysis is executed on the models, revealing that there are different, even contradicting, patterns between different performance regimes. Thus, simply mixing data among different performance regimes will not yield optimal results. Based on this analysis, we propose an instance-based transfer learning technique which trains the model using a dataset generated with an optimized strategy. The strategy involves instance and feature selection based on the physics behind differences in high- and low-performance discharges, as revealed by SHAP model analysis. The TrAdaBoost technique significantly improved the model performance from 0.78 BA (balanced accuracy) to 0.86 BA with a few high-performance operation data.
Read full abstract