Abstract

Contextual representation has taken center stage in Natural Language Processing (NLP) in the recent past. Models such as Bidirectional Encoder Representations from Transformers (BERT) have found tremendous success in the arena. As a first attempt in the mining industry, in the current work, BERT architecture is adapted in developing the MineBERT model to accomplish the classification of accident narratives from the US Mine Safety and Health Administration (MSHA) data set. In the past multi-year research, several machine learning (ML) methods were used by authors to improve classification success rates in nine significant MSHA accident categories. Out of nine, for four major categories (“Type Groups”) and five “narrow groups”, Random Forests (RF) registered 75% and 42% classification success rates, respectively, on average, while keeping the false positives under 5%. Feature-based innovative NLP methods such as accident-specific expert choice vocabulary (ASECV) and similarity score (SS) methods were developed to improve upon the RF success rates. A combination of all these methods (“Stacked” approach) is able to slightly improve success over RF (71%) to 73.28% for the major category “Caught-in”. Homographs in narratives are identified as the major problem that was preventing further success. Their presence was creating ambiguity problems for classification algorithms. Adaptation of BERT effectively solved the problem. When compared to RF, MineBERT implementation improved success rates among major and narrow groups by 13% and 32%, respectively, while keeping the false positives under 1%, which is very significant. However, BERT implementation in the mining industry, which has unique technical aspects and jargon, brought a set of challenges in terms of preparation of data, selection of hyperparameters, and fine-tuning the model to achieve the best performance, which was met in the current research.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call