Abstract

A new data cleaning procedure for the electron cyclotron emission imaging (ECEI) of the EAST tokamak is developed. Machine learning techniques, including support vector machine (SVM) and Decision Trees, are applied to the identification of saturated, zero, and weak signals of the ECEI raw data. As a result, the burden of data analysis is reduced, and the classification accuracy is improved. Proper training sets are sampled using the massive raw ECEI data from the EAST tokamak. The optimal window size of temporal signals, the kernel function, and other model parameters are obtained by the model training. Five-fold cross-validation (CV) is applied during modeling and an external testing set is employed to validate the prediction performance of models. The average recall rates on CV sets of saturated, zero, and weak signals are 95.9%, 96.72%, and 100%, respectively, which prove the accuracy of this procedure. Random Forest, as a comparative method, is also employed to deal with the same data sets. The average recall rates on CV sets of saturated, zero, and weak signals performed by Random Forest are 95.9%, 96.72%, and 95.88%. Our method has been proved to outperform Random Forest with small data sets.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.