ABSTRACT Person misfit and person reliability indices in item response theory (IRT) can play an important role in evaluating the validity of a test or survey instrument at the respondent level. Prior empirical comparisons of these indices have been applied to binary item response data and suggest that the two types of indices return very similar results. In this paper, however, we demonstrate an important applied distinction between these methods when applied to polytomously-scored rating scale items, namely their varying sensitivities to response style tendencies. Using several empirical datasets, we illustrate settings in which these indices are in one case highly correlated and in two other cases weakly correlated. In the datasets showing a weak correlation between indices, the primary distinction appears due to the effects of response style behavior, whereby respondents whose response styles are less common (e.g. a disproportionate selection of the midpoint response) are found to misfit using Drasgow et al’s person misfit index, but often show high levels of reliability from a person reliability perspective; just the opposite frequently occurs for respondents that over-select the rating scale extremes. It is suggested that person misfit reporting should be supplemented with an evaluation of person reliability to best understand the validity of measurement at the respondent level when using IRT models with rating scale measures.