Objective: The objective of this study is to present a novel framework, termed the knockoff technique, to evaluate different metric ranking algorithms to better describe human response to injury.Methods: Many biomechanical metrics are routinely obtained from impact tests using postmortem human surrogates (PMHS) to develop injury risk curves (IRCs). The IRCs form the basis to evaluate human safety in crashworthiness environments. The biomechanical metrics should be chosen based on some measure of their predictive ability. Commonly used algorithms for the choice of ranking the metrics include (a) areas under the receiver operating characteristic curve (AUROC), time-varying AUROC, and other adaptations, and (b) some variants of predictive squared error loss. This article develops a rigorous framework to evaluate the metric selection/ranking algorithms. Actual experimental data are used due to the shortcoming of using simulated data. The knockoff data are meshed into existing experimental data using advanced statistical algorithms. Error rate measures such as false discovery rates (FDRs) and bias are calculated using the knockoff technique. Experimental data are used from previously published whole-body PMHS side impact sled tests. The experiments were conducted at different velocities, padding and rigid load wall conditions, and offsets and with different supplemental restraint systems. The PMHS specimens were subjected to a single lateral impact loading resulting in injury and noninjury outcomes.Results: A total of 25 metrics were used from 42 tests. The AUROC-type algorithms tended to have higher FDRs compared to the squared error loss–type functions (45.3% for the best AUROC-type algorithms versus 31.4% for the best Brier score algorithm). Standard errors for the Brier score algorithm also tended to be lower, indicative of more stable metric choices and robust rankings. The wide variations observed in the performance of the algorithms demonstrated the need for data set–specific evaluation tools such as the knockoff technique developed in this study.Conclusions: In the present data set, the AUROCs and related binary classification algorithms led to inflated FDRs, rendering metric selection/ranking questionable. This is particularly true for data sets with a high proportion of censoring. Squared error loss–type algorithms (such as the Brier score algorithm or its modifications) improved the performance in the metric selection process. The presented new knockoff technique may wholly change how IRCs are developed from impact experiments or simulations. At the very least, the knockoff technique demonstrates the need for evaluations among different metric ranking/selection algorithms, especially when they produce substantially different biomechanical metric choices. Without recommending the AUROC-type or Brier score–type algorithms universally, the authors suggest careful assessments of these algorithms using the proposed framework, so that a robust algorithm may be chosen, with respect to the nature of the experimental data set. Though results are given for sets from a published series of experiments, the knockoff technique is being used by the authors in tests that are applicable to the automotive, aviation, military, and other environments.
Read full abstract