Abstract

In the field of work safety, the hazard description text is an important basis for accident investigation. However, because the text is unstructured, it causes a lot of inconvenience for subsequent analysis and processing, so it’s necessary to extract structured information from hazard description text. At present, there are few corpus and annotated data of hazard description text, cause it’s difficult to extract structured information based on machine learning models. To solve this problem, we proposed a new method based on strong part-of-speech pattern matching This method is based on the short length and relative simplicity of the hazard description text, matching predefined patterns through the part-of-speech sequences of the text, and then extract structured information of hazard entity and entity description. The method achieved 86.2% accuracy with text processing speed 5514 iters/s when only a small amount of annotated data required.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call