Digital Canaries: Identifying Hazardous Patterns in Msha Data Using A Machine Learner

Olivia Milam,Haroon Malik,“ Sarah Surber Sarah Surber

doi:10.1016/j.procs.2020.10.032

Olivia Milam, Haroon Malik + Show 1 more

Open Access

https://doi.org/10.1016/j.procs.2020.10.032

Copy DOI

Journal: Procedia Computer Science	Publication Date: Jan 1, 2020
License type: cc-by-nc-nd

Affiliation: Marshall University

Abstract

The Mine Safety and Health Administration (MSHA) of the United States is tasked with promoting mine safety through regulations and mine inspections. While performing this duty, MSHA has amassed an immense amount of data. MSHA’s publicly available dataset encompasses millions of data records, including mine details, inspections, violations, accidents, and penalties levied for non-compliance with mandatory health and safety standards. The dataset is regularly used for generating government reports and statistical data. The goal of this paper is to extract non-evident knowledge from this multi-million record database. The knowledge discovered would benefit MSHA and mining stakeholders by providing insight into unsafe trends, forecast potential issues, and provide information that could help avoid injuries and loss of life. This paper proposes a methodology that makes use of a machine learner, a random forest classifier, to score safety violations. MSHA and mining stakeholders could then use these scores to help identify potential hazards or undesirable situations. The results show the model produced by the proposed methodology can predict future incidents with a high accuracy.

Full Text