This article, written by JPT Technology Editor Chris Carpenter, contains highlights of paper IPTC 19485, “Data Mining of Hidden Danger in Enterprise Production Safety and Research of Hidden-Danger-Model Conversion,” by Kun Tian, Hong-Qiao Yan, Ya-Ming Mao, and Shun-Cheng Wu, CNPC, prepared for the 2019 International Petroleum Technology Conference, Beijing, 26–28 March. The paper has not been peer reviewed. The value of hidden-danger data stored in text can be revealed through an approach that can help sort and interpret information in an ordered way not used previously in safety management. These optimized data then can be used to apply safety-management techniques precisely, centralize operations, and reduce risk•levels. Introduction The collection and storage of huge amounts of data have demonstrated a lack of coordination between the development of data-collecting capacity and the means to analyze those data accurately. Some experts use analytical methods to study the relationship between security factors and accident events to guide equipment maintenance, quality testing, and related work. However, extracting hidden-danger data stored in text format has been a challenge for the petroleum and petrochemical industries. The data-mining technique discussed in the complete paper uses Chinese word segmentation, Chinese lexical annotation, named entity recognition, and other techniques to extract keywords from the text. Then, a structured hidden-danger database is built through a process of keyword mapping and extraction, data cleansing and integration, and data selection and transformation. Finally, the use of a data-stream sliding-window model and a correlation analysis comprises a meth-od of correlating hidden dangers and promotes the application of enterprise safety management. Theoretical Basis The mining of hidden-danger data includes mainly the preprocessing of text data, the construction of a structured data base, and data analysis. Collecting a Professional Vocabulary. Hidden-danger data are stored in textual form, but data-mining and machine-learning models cannot deal with these nonstructured (or half-structured) types of information directly. Thus, natural-language processing must be used. The collection of a professional vocabulary forms the basis of hidden-danger data analysis. Mechanisms by which this is accomplished include the•following: Chinese word segmentation. The hidden-danger description of Chinese character sequences is treated by segmentation, which aims to obtain a number of separated words, the meanings of which can be recognized by the computer automatically. New-word collection. Chinese word segmentation can be an ineffective technique in identifying professional vocabulary terms in hidden-danger data. For example, the phrase “health, safety, and environment (HSE) management system quantitative audit standards” might be divided into the subphrases and words “HSE,” “management system,” “quantification,” “audit,” and “standard,” all of which might be unsuited to meet problem-analysis requirements. Therefore, in practice, according to the position of each word in the text, one can find adjacent words that are used frequently and can merge these in order to build a vocabulary for analysis purposes. Collation with word meanings and lexical labels. Because problems are described by different people with different expressions, organizing the meanings of words within the vocabulary and standardizing the processing of the vocabulary are essential. To facilitate later analysis, the lexical characters should be labeled on the basis of collation to word meaning. At that point, a professional vocabulary can be created.
Read full abstract