People tend to categorize the valence of a target stimulus more quickly and accurately if the target appears after a prime stimulus of the same (than of the opposite) valence. The evaluative priming task (EPT) utilizes this priming effect for indirect evaluation measurement but suffers from low reliability. The present research compared the reliability and validity of 2,160 EPT scoring algorithms across 12 datasets. In contrast to current norms to delete trials with error responses and to rely solely on differences in response latency between different task conditions, superior performance was found when incorporating latency and accuracy data, and when basing the score on differences in mean ranking of each condition’s trials by performance (a scale-invariant non-parametric dominance score named G score). We recommend seven new scoring algorithms that, in comparison to current scoring norms, increase internal consistency by a mean of 41% and correlations with other measures by 17%.