Structured Data Extraction Method of Hazard Description Text Based on Strong Part-of-speech Matching

Yi Zhou,Cheng Chen,Aizhi Wu,Xinbo Ai,Peng Zhang

doi:10.1088/1742-6596/1746/1/012056

Abstract

In the field of work safety, the hazard description text is an important basis for accident investigation. However, because the text is unstructured, it causes a lot of inconvenience for subsequent analysis and processing, so it’s necessary to extract structured information from hazard description text. At present, there are few corpus and annotated data of hazard description text, cause it’s difficult to extract structured information based on machine learning models. To solve this problem, we proposed a new method based on strong part-of-speech pattern matching This method is based on the short length and relative simplicity of the hazard description text, matching predefined patterns through the part-of-speech sequences of the text, and then extract structured information of hazard entity and entity description. The method achieved 86.2% accuracy with text processing speed 5514 iters/s when only a small amount of annotated data required.

Full Text