Injury Narratives Research Articles

Introduction: Classical Machine Learning (ML) models have been found to assign the external-cause-of-injury codes (E-codes) based on injury narratives with good overall accuracy but often struggle with rare categories, primarily due to lack of enough training cases and heavily skewed nature of injurdata. In this paper, we have: a) studied the effect of increasing the size of training data on the prediction performance of three classical ML models: Multinomial Naïve Bayes (MNB), Support Vector Machine (SVM) and Logistic Regression (LR), and b) studied the effect of filtering based on prediction strength of LR model when the model is trained on very-small (10,000 cases) and very-large (450,000 cases) training sets. MethodData from Queensland Injury Surveillance Unit from years 2002–2012, which was categorized into 20 broad E-codes was used for this study. Eleven randomly chosen training sets of size ranging from 10,000 to 450,000 cases were used to train the ML models, and the prediction performance was analyzed on a prediction set of 50,150 cases. Filtering approach was tested on LR models trained on smallest and largest training datasets. Sensitivity was used as the performance measure for individual categories. Weighted average sensitivity (WAvg) and Unweighted average sensitivity (UAvg) were used as the measures of overall performance. Filtering approach was also tested for estimating category counts and was compared with approaches of summing prediction probabilities and counting direct predictions by ML model. ResultsThe overall performance of all three ML models improved with increase in the size of training data. The overall sensitivities with maximum training size for LR and SVM models were similar (∼82%), and higher than MNB (76%). For all the ML models, the sensitivities of rare categories improved with increasing training data but they were considerably less than sensitivities of larger categories. With increasing training data size, LR and SVM exhibited diminishing improvement in UAvg whereas the improvement was relatively steady in case of MNB. Filtering based on prediction strength of LR model (and manual review of filtered cases) helped in improving the sensitivities of rare categories. A sizeable portion of cases still needed to be filtered even when the LR model was trained on very large training set. For estimating category counts, filtering approach provided best estimates for most E-codes and summing prediction probabilities approach provided better estimates for rare categories. ConclusionsIncreasing the size of training data alone cannot solve the problem of poor classification performance on rare categories by ML models. Filtering could be an effective strategy to improve classification performance of rare categories when large training data is not available.

BackgroundIn occupational safety research, narrative text analysis has been combined with coded surveillance, data to improve identification and understanding of injuries and their circumstances. Injury data give, information about incidence and the direct cause of an injury, while near-miss data enable the, identification of various hazards within an organization or industry. Further, near-miss data provide an, opportunity for surveillance and risk reduction. The National Firefighter Near-Miss Reporting System, (NFFNMRS) is a voluntary reporting system that collects narrative text data on near-miss and injurious, events within the fire and emergency services industry. In recent research, autocoding techniques, using Bayesian models have been used to categorize/code injury narratives with up to 90% accuracy, thereby reducing the amount of human effort required to manually code large datasets. Autocoding, techniques have not yet been applied to near-miss narrative data. MethodsWe manually assigned mechanism of injury codes to previously un-coded narratives from the, NFFNMRS and used this as a training set to develop two Bayesian autocoding models, Fuzzy and Naïve. We calculated sensitivity, specificity and positive predictive value for both models. We also evaluated, the effect of training set size on prediction sensitivity and compared the models’ predictive ability as, related to injury outcome. We cross-validated a subset of the prediction set for accuracy of the model, predictions. ResultsOverall, the Fuzzy model performed better than Naïve, with a sensitivity of 0.74 compared to 0.678., Where Fuzzy and Naïve shared the same prediction, the cross-validation showed a sensitivity of 0.602., As the number of records in the training set increased, the models performed at a higher sensitivity, suggesting that both the Fuzzy and Naïve models were essentially “learning”. Injury records were, predicted with greater sensitivity than near-miss records. ConclusionWe conclude that the application of Bayesian autocoding methods can successfully code both near misses, and injuries in longer-than-average narratives with non-specific prompts regarding injury. Such, coding allowed for the creation of two new quantitative data elements for injury outcome and injury, mechanism.

Injury Narratives Research Articles

Articles published on Injury Narratives

Semi-automated text mining strategies for identifying rare causes of injuries from emergency room triage data

Resisting Corporeal Boundaries in Body Work and Knowledge Work

The junior to senior transition: a narrative analysis of the pathways of two Swedish athletes

Discourse Formulation and Neurovascular Activation in Four Genres

Improving autocoding performance of rare categories in injury classification: Is more training data or filtering the solution?

Pediatric moped-related injuries in the United States from 2002 to 2014: Age-related comparisons of mechanisms and outcomes.

Safety of union home care aides in Washington State

Using the narratives of Ontarians with a work-related traumatic brain injury to inform injury prevention: A mixed methods approach.

Classifying injury narratives of large administrative databases for surveillance—A practical approach combining machine learning ensembles and human review

Assessing the completeness of coded and narrative data from the Victorian Emergency Minimum Dataset using injuries sustained during fitness activities as a case study.

Toy safety surveillance from online reviews

Identifying and mitigating risks for agricultural injury associated with obesity.

Computerized “Learn-As-You-Go” classification of traumatic brain injuries using NEISS narrative data

Harnessing information from injury narratives in the ‘big data’ era: understanding and applying machine learning for injury surveillance

Comparison of methods for auto-coding causation of injury narratives

A practical tool for public health surveillance: Semi-automated coding of short injury narratives from large administrative databases using Naïve Bayes algorithms

Why do farmworkers delay treatment after debilitating injuries? Thematic analysis explains if, when, and why farmworkers were treated for injuries.

Worker Injuries Involving the Interaction of Cattle, Cattle Handlers, and Farm Structures or Equipment.

Making the most of injury surveillance data: Using narrative text to identify exposure information in case-control studies

Near-miss narratives from the fire service: A Bayesian analysis

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Injury Narratives Research Articles

Articles published on Injury Narratives

Semi-automated text mining strategies for identifying rare causes of injuries from emergency room triage data

Resisting Corporeal Boundaries in Body Work and Knowledge Work

The junior to senior transition: a narrative analysis of the pathways of two Swedish athletes

Discourse Formulation and Neurovascular Activation in Four Genres

Improving autocoding performance of rare categories in injury classification: Is more training data or filtering the solution?

Pediatric moped-related injuries in the United States from 2002 to 2014: Age-related comparisons of mechanisms and outcomes.

Safety of union home care aides in Washington State

Using the narratives of Ontarians with a work-related traumatic brain injury to inform injury prevention: A mixed methods approach.

Classifying injury narratives of large administrative databases for surveillance—A practical approach combining machine learning ensembles and human review

Assessing the completeness of coded and narrative data from the Victorian Emergency Minimum Dataset using injuries sustained during fitness activities as a case study.

Toy safety surveillance from online reviews

Identifying and mitigating risks for agricultural injury associated with obesity.

Computerized “Learn-As-You-Go” classification of traumatic brain injuries using NEISS narrative data

Harnessing information from injury narratives in the ‘big data’ era: understanding and applying machine learning for injury surveillance

Comparison of methods for auto-coding causation of injury narratives

A practical tool for public health surveillance: Semi-automated coding of short injury narratives from large administrative databases using Naïve Bayes algorithms

Why do farmworkers delay treatment after debilitating injuries? Thematic analysis explains if, when, and why farmworkers were treated for injuries.

Worker Injuries Involving the Interaction of Cattle, Cattle Handlers, and Farm Structures or Equipment.

Making the most of injury surveillance data: Using narrative text to identify exposure information in case-control studies

Near-miss narratives from the fire service: A Bayesian analysis