스마트팩토리 에러텍스트의 토픽모델링을 통한 에러유형 분석 및 데이터라벨링

Keun-Hyung Kim,Byung-Gu Lee,Tae-Soo Moon,Horim Kim,Jae-Jung Kang,Jin-Hyun Ahn

doi:10.37272/jiecr.2022.08.22.4.15

Abstract

Data labeling is the task of supplementing data corresponding to the correct answer in training data for AI(Artificial Intelligence). Data labeling helps improve the performance of AI learning models, but is often done manually. Automating data labeling tasks so that high-quality learning data can be efficiently produced can serve as a foundation for the development of artificial intelligence.BR In this paper, we proposed a method using topic modeling to automate data labeling of error data produced in the smart factory control system. Error text files were created by extracting major error-related items and error messages from the database accumulated in the smart factory operating environment. Before the topic modeling, frequently appearing words were extracted through basic analysis of error text, and main causes of errors were roughly identified by visualizing them with bar graphs and word clouds. After that, major topics related to errors were extracted by applying topic modeling to the error text. Based on the key words included in the topics, meanings were given to each topic, error types were derived, and error type codes were also assigned. Coherence and Perplexity were calculated to derive the optimal number of topics, and 4-5 topics were found to be optimal. This paper is meaningful in that it confirmed the possibility of automating data labeling in big data including text data.

Full Text