Human coders, in many organizations conducting injury surveillance, routinely assign External-cause-of-injury codes (E-codes) to short narratives describing the incident, transcribed by triage nurses or others in hospital emergency rooms or other settings. Machine learning (ML) models trained on coded injury narratives can accurately assign E-codes to a large portion of the data, but tend to poorly predict cases falling into rare categories. In this study, we examined several ways of filtering out cases for human review that were likely to belong to rare categories from the predictions of Logistic Regression and Naïve Bayes classifiers for a manually-coded emergency department triage dataset of approximately 500,000 cases, collected between years 2002–2012, provided by the Queensland Injury Surveillance Unit. The ML models were trained using 90% of the data and the filtering approaches were evaluated on a prediction set comprised of the remaining cases. Cost analysis was also performed to compare the efficiency of each filtering method. The results showed that each filtering method greatly improved the ability to detect rare categories. Filtering using expert-designed causal linguistic rules combined with Logistic Regression prediction strength was found to be the most efficient approach. Several completely automated filtering approaches were also found to be effective.