How Extreme Is It Anyways?: An Empirical Investigation Into the Prevalence and Strength of Extreme Response Style.

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

Extreme response style (ERS), the tendency of participants to endorse the extreme categories of an item partially independent of item content, has repeatedly been found to decrease the validity of Likert-type scale results. For this reason, many IRT models have been developed that attempt to detect and correct for ERS. Despite the substantive literature on ERS and modeling of ERS, several important questions remain. To date, there is no clear estimate of how often ERS occurs in practice across a variety of scales and populations. In addition, there is little guidance on what item parameters for ERS models are commonly found in empirical data, while this information is crucial to inform future methodological studies utilizing ERS models. Finally, there is only limited information available on which ERS models tend to fit the data best. The current study sets out to address these three issues by analyzing data from the Programme for International Student Assessment using a generalized partial credit model, several multidimensional nominal response models, and several IRTree models. Results indicate an extremely high prevalence of ERS across scales, populations, and timepoints. Item parameters for future methodological studies are presented, and a general preference for IRTree models over MNRM models is found in many datasets. Implications for futures studies are discussed, and recommendations for practice are made.

Similar Papers
  • PDF Download Icon
  • Research Article
  • Cite Count Icon 3
  • 10.1177/21582440221108168
The Impact of Extreme Response Style on the Mean Comparison of Two Independent Samples
  • Apr 1, 2022
  • Sage Open
  • Yingbin Zhang + 2 more

Extreme response style (ERS) is prevalent in survey research using rating scales. It may cause biased results in group comparisons. This research conducted two sets of simulation studies to explore the magnitude of the ERS impact on mean comparisons between two independent samples. Data were generated from a multidimensional nominal response model. Study 1 examined the influence of ERS on the estimate of group differences in the variable of interest. The results indicated that ERS led to biased estimates, especially when these groups differed significantly in ERS. The correlation between ERS and the variable of interest also moderated the ERS impact. The results were illustrated with an empirical example. Study 2 investigated the impact of ERS on the type I error and type II error in the independent t-test based on scale scores. When the variable of interest had no true difference between groups, ERS inflated the type I error rate. When the difference existed, ERS inflated the type II error rate. Two groups’ true difference in ERS and the variable of interest, unequal ERS variances, the correlation between ERS and the variable of interest, and the number of items moderated the impact of ERS on type I and II error rates. The implications for practices and further research are discussed.

  • Research Article
  • Cite Count Icon 11
  • 10.1177/00131644231155838
Correcting for Extreme Response Style: Model Choice Matters.
  • Feb 17, 2023
  • Educational and Psychological Measurement
  • Martijn Schoenmakers + 3 more

Extreme response style (ERS), the tendency of participants to select extreme item categories regardless of the item content, has frequently been found to decrease the validity of Likert-type questionnaire results. For this reason, various item response theory (IRT) models have been proposed to model ERS and correct for it. Comparisons of these models are however rare in the literature, especially in the context of cross-cultural comparisons, where ERS is even more relevant due to cultural differences between groups. To remedy this issue, the current article examines two frequently used IRT models that can be estimated using standard software: a multidimensional nominal response model (MNRM) and a IRTree model. Studying conceptual differences between these models reveals that they differ substantially in their conceptualization of ERS. These differences result in different category probabilities between the models. To evaluate the impact of these differences in a multigroup context, a simulation study is conducted. Our results show that when the groups differ in their average ERS, the IRTree model and MNRM can drastically differ in their conclusions about the size and presence of differences in the substantive trait between these groups. An empirical example is given and implications for the future use of both models and the conceptualization of ERS are discussed.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 13
  • 10.1177/00131644231206765
Investigating Heterogeneity in Response Strategies: A Mixture Multidimensional IRTree Approach.
  • Nov 9, 2023
  • Educational and psychological measurement
  • Ö Emre C Alagöz + 1 more

To improve the validity of self-report measures, researchers should control for response style (RS) effects, which can be achieved with IRTree models. A traditional IRTree model considers a response as a combination of distinct decision-making processes, where the substantive trait affects the decision on response direction, while decisions about choosing the middle category or extreme categories are largely determined by midpoint RS (MRS) and extreme RS (ERS). One limitation of traditional IRTree models is the assumption that all respondents utilize the same set of RS in their response strategies, whereas it can be assumed that the nature and the strength of RS effects can differ between individuals. To address this limitation, we propose a mixture multidimensional IRTree (MM-IRTree) model that detects heterogeneity in response strategies. The MM-IRTree model comprises four latent classes of respondents, each associated with a different set of RS traits in addition to the substantive trait. More specifically, the class-specific response strategies involve (1) only ERS in the "ERS only" class, (2) only MRS in the "MRS only" class, (3) both ERS and MRS in the "2RS" class, and (4) neither ERS nor MRS in the "0RS" class. In a simulation study, we showed that the MM-IRTree model performed well in recovering model parameters and class memberships, whereas the traditional IRTree approach showed poor performance if the population includes a mixture of response strategies. In an application to empirical data, the MM-IRTree model revealed distinct classes with noticeable class sizes, suggesting that respondents indeed utilize different response strategies.

  • PDF Download Icon
  • Research Article
  • 10.3758/s13428-025-02756-6
Posterior predictive checks for the detection of extreme response style
  • Jan 1, 2025
  • Behavior Research Methods
  • Martijn Schoenmakers + 3 more

Extreme response style (ERS), the tendency of participants to select extreme item categories regardless of the item content, has frequently been found to decrease the validity of Likert-type questionnaire results (e.g., Moors, European Journal of Work and Organizational Psychology, 21, 271–298, 2012). For this reason, detecting ERS at both the group and individual levels is of paramount importance. While various approaches to detecting ERS exist, these may conflate ERS with the trait of interest, require additional questionnaires to be administered, or require the use of mixture or multidimensional IRT models. As an alternative approach to detecting ERS, Bayesian posterior predictive checks (PPCs) may be a viable option. Posterior predictive checking offers a highly customizable framework for detecting model misfit, which can be directly applied to frequently used unidimensional IRT models. Critically, the use of PPCs to detect ERS does not require strong assumptions regarding the nature of ERS, such as ERS being a continuous dimension or a categorical trait. In this paper, we thus apply PPCs to a generalized partial credit model to detect model misfit related to ERS on both the group and person levels. We propose various possible PPCs tailored to ERS, which are illustrated in an empirical example, and their performance in detecting ERS is examined under various conditions. Suggestions for practical applications are provided, and avenues for future research are explored.

  • Research Article
  • Cite Count Icon 8
  • 10.1002/ijop.12287
Extreme response style as a cultural response to climato-economic deprivation.
  • Jun 3, 2016
  • International journal of psychology : Journal international de psychologie
  • Jia He + 2 more

We investigated the effects of climato-economic harshness on extreme response style. Climato-economic theorising postulates that a more threatening climate in poorer countries, in contrast to countries with a more comforting climate and richer countries with a more challenging climate, triggers intolerance of ambiguity and uncertainty avoidance inherent to conservatism, in-group favouritism and autocracy. Scores of extreme response style at country level, a proxy of this cluster of cultural characteristics, were extracted from students' responses in the Programme for International Student Assessment to test the hypothesis. In a series of hierarchical regression analysis across 64 countries, cold demands, heat demands and GDP per capita showed a highly significant interaction effect on extreme response style, predicting in total 30.7% of the variance. Extreme response style was highest in poorer countries with higher climatic demands, and lowest in richer countries with lower climate demands. Implications are discussed.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 7
  • 10.1177/00131644231213319
Separation of Traits and Extreme Response Style in IRTree Models: The Role of Mimicry Effects for the Meaningful Interpretation of Estimates.
  • Dec 22, 2023
  • Educational and psychological measurement
  • Viola Merhof + 2 more

Item response tree (IRTree) models are a flexible framework to control self-reported trait measurements for response styles. To this end, IRTree models decompose the responses to rating items into sub-decisions, which are assumed to be made on the basis of either the trait being measured or a response style, whereby the effects of such person parameters can be separated from each other. Here we investigate conditions under which the substantive meanings of estimated extreme response style parameters are potentially invalid and do not correspond to the meanings attributed to them, that is, content-unrelated category preferences. Rather, the response style factor may mimic the trait and capture part of the trait-induced variance in item responding, thus impairing the meaningful separation of the person parameters. Such a mimicry effect is manifested in a biased estimation of the covariance of response style and trait, as well as in an overestimation of the response style variance. Both can lead to severely misleading conclusions drawn from IRTree analyses. A series of simulation studies reveals that mimicry effects depend on the distribution of observed responses and that the estimation biases are stronger the more asymmetrically the responses are distributed across the rating scale. It is further demonstrated that extending the commonly used IRTree model with unidimensional sub-decisions by multidimensional parameterizations counteracts mimicry effects and facilitates the meaningful separation of parameters. An empirical example of the Program for International Student Assessment (PISA) background questionnaire illustrates the threat of mimicry effects in real data. The implications of applying IRTree models for empirical research questions are discussed.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 20
  • 10.3389/fpsyg.2020.00271
Validity of Three IRT Models for Measuring and Controlling Extreme and Midpoint Response Styles.
  • Feb 21, 2020
  • Frontiers in Psychology
  • Yingbin Zhang + 1 more

Response styles, the general tendency to use certain categories of rating scales over others, are a threat to the reliability and validity of self-report measures. The mixed partial credit model, the multidimensional nominal response model, and the item response tree model are three widely used models for measuring extreme and midpoint response styles and correcting their effects. This research aimed to examine and compare their validity by fitting them to empirical data and correlating the content-related factors and the response style-related factors in these models to extraneous criteria. The results showed that the content factors yielded by these models were moderately related to the content criterion and not related to the response style criteria. The response style factors were moderately related to the response style criteria and weakly related to the content criterion. Simultaneous analysis of more than one scale could improve their validity for measuring response styles. These findings indicate that the three models could control and measure extreme and midpoint response styles, though the validity of the mPCM for measuring response styles was not good in some cases. Overall, the multidimensional nominal response model performed slightly better than the other two models.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 31
  • 10.1186/s40536-015-0012-0
Examining the attitude-achievement paradox in PISA using a multilevel multidimensional IRT model for extreme response style
  • Aug 18, 2015
  • Large-scale Assessments in Education
  • Yi Lu + 1 more

In this paper, we consider a two-level multidimensional item response model that examines country differences in extreme response style (ERS) as a possible cause for the achievement-attitude paradox in PISA 2006. The model is an extension of Bolt & Newton (2011) that uses response data from seven attitudinal scales to assess response style and to control for its effects in estimating correlations between attitudes and achievement. Despite detectable variability in ERS across countries and detectable biasing effects of ERS on attitudinal scores, our results suggest that the unexpected between-country correlation between attitudes and achievement is not attributable to country differences in ERS. The remaining between-country correlations between mean attitudes and mean achievement once controlling for ERS can be explained by the observation that (1) despite detectable country differences, most variability in ERS occurs within, as opposed to between, countries, and (2) ERS appears to be only weakly correlated with achievement. The methodological approach used in this paper is argued to provide an informative way of studying the effects (or lack thereof) of cross-country variability in response style.

  • Research Article
  • Cite Count Icon 8
  • 10.7203/relieve.22.1.8282
Corrigiendo las diferencias de uso de escala entre países de América Latina, Portugal y España en PISA
  • Jul 8, 2016
  • RELIEVE - Revista Electrónica de Investigación y Evaluación Educativa
  • Jia He + 1 more

En este trabajo se investigaron los efectos de las correcciones sobre la preferencia de uso de la escala en siete países de América Latina, Portugal y España en cuestionarios de estudiantes en el Programa para la Evaluación de Estudiantes 2012 (PISA). Estos países destinatarios tienden a mostrar una tendencia de expresar opiniones fuertes y de auto-mejora, lo que puede plantear amenazas graves de validez de las comparaciones transculturales de los cuestionarios. Se examinó en qué medida la puntuación de correcciones, que se han propuesto, podría cambiar el patrón de las diferencias culturales. Hemos corregido para las preferencias de uso de la escala de una medida de ayuda al profesor de entre 39,045 estudiantes en nueve países, con base en el tipo de respuesta extrema, overclaiming, y el anclaje de viñetas, respectivamente. Estas medidas mostraron diferentes efectos: (1) Todos los métodos de corrección ayudaron a mejorar la invariancia de medición, a pesar de que la corrección sobre la base de anclaje fue menos eficaz en alcanzar la invariancia escalar en comparación con la corrección de estilo de respuesta extrema y overclaiming; (2) el control de estilo de respuesta extrema y overclaiming cambia la puntuación media de España en mayor medida que en otros países, lo que parece presentar un patrón más realista, mientras que los cambios en las correlaciones con otras medidas fue bastante limitado. El uso de las puntuaciones de anclaje llevó a cambios drásticos tanto en medios como en correlaciones. Una conclusión firme sobre qué método es preferible, no puede ser ofrecido ya que no hay evidencia de que el método mejore la validez de las puntuaciones en estos países. Se discute la necesidad y la viabilidad de los métodos de corrección.

  • Research Article
  • 10.1177/01466216251379471
Distinguishing Between Models for Extreme and Midpoint Response Styles as Opposite Poles of a Single Dimension versus Two Separate Dimensions: A Simulation Study.
  • Sep 13, 2025
  • Applied psychological measurement
  • Martijn Schoenmakers + 2 more

Extreme and midpoint response styles have frequently been found to decrease the validity of Likert-type questionnaire results. Different approaches for modelling extreme and midpoint responding have been proposed in the literature, with some advocating for a unidimensional conceptualization of the response styles as opposite poles, and others modelling them as separate dimensions. How these response styles are modelled influences the estimation complexity, parameter estimates, and detection of and correction for response styles in IRT models. For these reasons, we examine if it is possible to empirically distinguish between extreme and midpoint responding as two separate dimensions versus two opposite sides of a single dimension. The various conceptualizations are modelled using the multidimensional nominal response model, with the AIC and BIC being used to distinguish between the competing models in a simulation study and an empirical example. Results indicate good performance of both information criteria given sufficient sample size, test length, and response style strength. The BIC outperformed the AIC in cases where no response styles were present, while the AIC outperformed the BIC in cases where multiple response style dimensions were present. Implications of the results for practice are discussed.

  • Research Article
  • Cite Count Icon 32
  • 10.1177/0013164415591848
A Simulation Study on Methods of Correcting for the Effects of Extreme Response Style.
  • Jun 29, 2015
  • Educational and Psychological Measurement
  • Eunike Wetzel + 2 more

The impact of response styles such as extreme response style (ERS) on trait estimation has long been a matter of concern to researchers and practitioners. This simulation study investigated three methods that have been proposed for the correction of trait estimates for ERS effects: (a) mixed Rasch models, (b) multidimensional item response models, and (c) regression residuals. The methods were compared with respect to their ability of recovering the true latent trait levels. Data were generated according to a unidimensional model with only one trait, a mixed Rasch model with two populations of ERS and non-ERS, and a two-dimensional model incorporating a trait and an ERS dimension. The data were analyzed using the same models as well as linear regression where the trait estimate is regressed on an ERS score and the resulting residual is considered the corrected trait estimate. Over all conditions, the two-dimensional model achieved the best trait recovery, though the difference to the unidimensional model was rather small. Mixed Rasch models were in general inferior to the other correction methods. When the trait and ERS showed no to weak correlations, ERS appeared to have a minor impact on trait estimation.

  • Research Article
  • Cite Count Icon 70
  • 10.1111/j.1467-9531.2011.01238.x
Dealing with Extreme Response Style in Cross-Cultural Research: A Restricted Latent Class Factor Analysis Approach
  • Jun 14, 2011
  • Sociological Methodology
  • Meike Morren + 2 more

Cross-cultural comparison of attitudes using rating scales may be seriously biased by response styles. This paper deals with statistical methods for detection of and correction for extreme response style (ERS), which is one of the well-documented response styles. After providing an overview of available statistical methods for dealing with ERS, we argue that the latent class factor analysis (LCFA) approach proposed by Moors (2003) has several advantages compared to other methods. Moors' method involves defining a latent variable model which, in addition to the substantive factors of interest, contains an ERS factor. In LCFA the observed ratings can be treated as nominal responses, which is necessary for modeling ERS. We find strong evidence for the presence of ERS and, moreover, find that the groups differ not only in their attitudes but also in ERS. These findings underscore the importance of controlling for ERS when examining attitudes in cross-cultural research.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 41
  • 10.3389/fpsyg.2016.01706
Mixture Random-Effect IRT Models for Controlling Extreme Response Style on Rating Scales
  • Nov 2, 2016
  • Frontiers in Psychology
  • Hung-Yu Huang

Respondents are often requested to provide a response to Likert-type or rating-scale items during the assessment of attitude, interest, and personality to measure a variety of latent traits. Extreme response style (ERS), which is defined as a consistent and systematic tendency of a person to locate on a limited number of available rating-scale options, may distort the test validity. Several latent trait models have been proposed to address ERS, but all these models have limitations. Mixture random-effect item response theory (IRT) models for ERS are developed in this study to simultaneously identify the mixtures of latent classes from different ERS levels and detect the possible differential functioning items that result from different latent mixtures. The model parameters can be recovered fairly well in a series of simulations that use Bayesian estimation with the WinBUGS program. In addition, the model parameters in the developed models can be used to identify items that are likely to elicit ERS. The results show that a long test and large sample can improve the parameter estimation process; the precision of the parameter estimates increases with the number of response options, and the model parameter estimation outperforms the person parameter estimation. Ignoring the mixtures and ERS results in substantial rank-order changes in the target latent trait and a reduced classification accuracy of the response styles. An empirical survey of emotional intelligence in college students is presented to demonstrate the applications and implications of the new models.

  • Research Article
  • Cite Count Icon 83
  • 10.1027/1015-5759/a000291
Multidimensional Modeling of Traits and Response Styles
  • Sep 1, 2017
  • European Journal of Psychological Assessment
  • Eunike Wetzel + 1 more

Abstract. Response styles can influence item responses in addition to a respondent’s latent trait level. A common concern is that comparisons between individuals based on sum scores may be rendered invalid by response style effects. This paper investigates a multidimensional approach to modeling traits and response styles simultaneously. Models incorporating different response styles as well as personality traits (Big Five facets) were compared regarding model fit. Relationships between traits and response styles were investigated and different approaches to modeling extreme response style (ERS) were compared regarding their effects on trait estimates. All multidimensional models showed a better fit than the unidimensional models, indicating that response styles influenced item responses with ERS showing the largest incremental variance explanation. ERS and midpoint response style were mainly trait-independent whereas acquiescence and disacquiescence were strongly related to several personality traits. Expected a posteriori estimates of participants’ trait levels did not differ substantially between two-dimensional and unidimensional models when a set of heterogeneous items was used to model ERS. A minor adjustment of trait estimates occurred when the same items were used to model ERS and the trait, though the ERS dimension in this approach only reflected scale-specific ERS, rather than a general ERS tendency.

  • Research Article
  • Cite Count Icon 16
  • 10.1111/jedm.12205
Modeling Response Styles in Cross‐Country Self‐Reports: An Application of a Multilevel Multidimensional Nominal Response Model
  • Mar 1, 2019
  • Journal of Educational Measurement
  • Unhee Ju + 1 more

We examined the feasibility and results of a multilevel multidimensional nominal response model (ML‐MNRM) for measuring both substantive constructs and extreme response style (ERS) across countries. The ML‐MNRM considers within‐country clustering while allowing overall item slopes to vary across items and examination of whether certain items were more prone to ERS. We applied this model to survey items from TALIS 2013. Results indicated that self‐efficacy items were more likely to trigger ERS compared to need for professional development, and the between‐country relationships among constructs can change due to ERS. Simulations assessed the estimation approach and found adequate recovery of model parameters and factor scores. We stress the importance of additional validity studies to improve the cross‐cultural comparability of substantive constructs.

Save Icon
Up Arrow
Open/Close
Notes

Save Important notes in documents

Highlight text to save as a note, or write notes directly

You can also access these Documents in Paperpal, our AI writing tool

Powered by our AI Writing Assistant