Annotation method of risk data in a certain field based on pattern matching

Weibo Geng,Yingxiao Zhao,Ping Xu,Jiaoyang Cai,Fang Fang

doi:10.1051/e3sconf/202452201046

Abstract

With the development of information technology and the increasing complexity of industrial technology, there is an urgent need for a certain field to use big data and artificial intelligence to improve the management and decision-making level. In order to classify the field’s risk text data through intelligent algorithms, analysing the risk distribution and the major problems, this paper researches on the annotation methods of training data in this field. The proposed data annotation method is based on pattern matching, addressing the special problems of risk data annotation in this field (such as strong professionalism, small data volume, high accuracy requirement and timeliness requirements). A new matching pattern is generated through the steps of text segmentation, keyword extraction, pattern preliminary generation, pattern relation tree construction, pattern optimization, pattern generalization, pattern verification, classification and annotation, and final classification and annotation are performed after pattern matching. Performance tests in terms of accuracy, recall rate, and annotation time have shown that the overall performance of the proposed method outperforms that of traditional item-by-item manual annotation, and semi-automatic annotation methods through machine learning. The method described in this paper has strong application value for risk data annotation in this field, and also has certain reference significance for high-density, high-accuracy and high-timeliness data annotation in other fields.

Full Text