Interpretation Of Test Scores Research Articles

Addressing differential item functioning (DIF) provides validity evidence to support the interpretation of test scores across groups. Conventional DIF methods flag DIF items statistically, but often fail to consolidate a substantive interpretation. The lack of interpretability of DIF results is particularly pronounced in writing assessment where the matching of test takers’ proficiency levels often relies on external variables and the reported DIF effect is frequently small in magnitude. Using responses to a prompt that showed small gender DIF favoring female test takers, we demonstrate a corpus-based approach that helps address DIF interpretation. To provide linguistic insights into the possible sources of the small DIF effect, this study compared a gender-balanced corpus of 826 writing samples matched by test takers’ performance on the reading and listening components of the test. Four groups of linguistic features that correspond to the rating dimensions, and thus partially represent the writing construct were analyzed. They include (1) sentiment and social cognition, (2) cohesion, (3) syntactic features, and (4) lexical features. After initial screening, 123 linguistic features, all of which were correlated with the writing scores, were retained for gender comparison. Among these selected features, female test takers’ writing samples scored higher on six of them with small effect sizes in the categories of cohesion and syntactic features. Three of the six features were positively correlated with higher writing scores, while the other three were negative. These results are largely consistent with previous findings of gender differences in language use. Additionally, the small differences in the language features of the writing samples (in terms of the small number of features that differ between genders and the small effect size of the observed differences) are consistent with the previous DIF results, both suggesting that the effect of gender differences on the writing scores is likely to be very small. In sum, the corpus-based findings provide linguistic insights into the gender-related language differences and their potential consequences in a testing context. These findings are meaningful for furthering our understanding of the small gender DIF effect identified through statistical analysis, which lends support to the validity of writing scores.

Read full abstract

BackgroundGood hearing is a fundamental skill that allows children to develop properly, both socially and intellectually. In contrast to defects in inner ear function, however, auditory processing disorders (APDs)–which can affect up to 2–3% of school-children–are not easily identified with basic screening programs and must be diagnosed using special tests. Although such psychoacoustic tests are available, the scores achieved depend highly on the social, cultural, and linguistic characteristics of the population, and norms must be established for each population separately. Reference values are still lacking for the Polish population, especially for children in school-age, so that practitioners must interpret test scores themselves, often intuitively or using potentially biased thresholds from other countries.Materials and methodsWe investigated a sample of 94 Polish schoolchildren with normal hearing, divided into four age groups: from 7 years-olds to 10 years-olds. All children had no speech or language development disorder, learning problem, or symptom of APD. Participants were volunteers who had previously taken part in a large screening study. The group consisted of 56 girls (60%) and 38 boys (40%) with an average age of 8.6 years (SD = 1.1). The test battery included the Duration Pattern Test (DPT), Frequency Pattern Test (FPT), Time-Compressed Speech Test (CST), and Dichotic Digit Test (DDT).ResultsThe scores on all tests increased consistently with age. The difference between each age-group for DPT, CST, and left- and right-ear DDT tests was significant (Kruskal–Wallis test, p-values = 0.002, 0.006, 0.005, 0.020, respectively), but the effect of age on the FPT test was not (p-value = 0.143). The analysis showed a clear and significant separation between a merged group of ages 7 and 8 and another of ages 9 and 10. We, therefore, propose, for each test, separate reference values for these two particular age-groups. Using thresholds based on a 10% quantile, we offer the following reference values for ages 7–8 and 9–10 respectively: DPT, 28.5% and 53.8%; FPT, 18.5% and 27.5%; CST, 68.6% and 77.2%; left-ear DDT, 34.3% and 52.5%; right-ear DDT, 56% and 72.5%.ConclusionThe scores on psychoacoustic tests to diagnose APD differ between cultures and linguistic backgrounds. Clinicians should, therefore, use norms that have been designed for the population most similar to their patients. Here, we report the use of a test battery designed for the Polish language that accounts for various aspects of APD when screening school children. Together with a full methodology of those tests, we provide norms that can be used as cut-offs in clinical diagnosis. Practitioners are invited to use them to obtain more accurate, evidence-based decisions.

Read full abstract

Interpretation Of Test Scores Research Articles

Related Topics

Articles published on Interpretation Of Test Scores

Using Corpus Analyses to Help Address the DIF Interpretation: Gender Differences in Standardized Writing Assessment.

Group Differences in the Value of Subscores: A Fairness Issue

Validity Evidence in Science Achievement Assessments Found in a Sample of Published Research Articles on Science Teaching

The Redesigned TOEIC Bridge® Tests: Relations to Test‐Taker Perceptions of Proficiency in English

ICT Engagement: a new construct and its assessment in PISA 2015

Interrater and Test-Retest Reliability of Performance-Based Clinical Tests Administered to Established Users of Lower Limb Prostheses.

Assessing subjective and objective information literacy at upper secondary schools - an empirical study in four German-speaking countries

Assessment of Item Response Model-Data Fit Via Bayesian Limited Information Model Comparison Posterior Predictive Checks

The effect of rolling walker use on interpretation of Timed Up and Go test scores: a preliminary study.

Mapping the TOEFL iBT® Test Scores to China's Standards of English Language Ability: Implications for Score Interpretation and Use

Interpreting Test Scores for Compensatory Education Students

Word-reading ability as a “hold test” in cognitively normal young adults with history of concussion and repetitive head impact exposure: A CARE Consortium Study

A new revised Graded Naming Test and new normative data including older adults (80-97years).

Fitting MD analysis in an argument-based validity framework for writing assessment: Explanation and generalization inferences for the ECPE

Questionnaire validation practice: a protocol for a systematic descriptive literature review of health literacy assessments

Reference values for psychoacoustic tests on Polish school children 7-10 years old.

Students’ perceptions of assessment quality related to their learning approaches and learning outcomes

Ethics and Fairness in Assessing Learning Outcomes in Higher Education

Validating Test Score Interpretations Using Time Information.

Maximum effort may not be required for valid intelligence test score interpretations

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Interpretation Of Test Scores Research Articles

Related Topics

Articles published on Interpretation Of Test Scores

Using Corpus Analyses to Help Address the DIF Interpretation: Gender Differences in Standardized Writing Assessment.

Group Differences in the Value of Subscores: A Fairness Issue

Validity Evidence in Science Achievement Assessments Found in a Sample of Published Research Articles on Science Teaching

The Redesigned TOEIC Bridge® Tests: Relations to Test‐Taker Perceptions of Proficiency in English

ICT Engagement: a new construct and its assessment in PISA 2015

Interrater and Test-Retest Reliability of Performance-Based Clinical Tests Administered to Established Users of Lower Limb Prostheses.

Assessing subjective and objective information literacy at upper secondary schools - an empirical study in four German-speaking countries

Assessment of Item Response Model-Data Fit Via Bayesian Limited Information Model Comparison Posterior Predictive Checks

The effect of rolling walker use on interpretation of Timed Up and Go test scores: a preliminary study.

Mapping the TOEFL iBT® Test Scores to China's Standards of English Language Ability: Implications for Score Interpretation and Use

Interpreting Test Scores for Compensatory Education Students

Word-reading ability as a “hold test” in cognitively normal young adults with history of concussion and repetitive head impact exposure: A CARE Consortium Study

A new revised Graded Naming Test and new normative data including older adults (80-97years).

Fitting MD analysis in an argument-based validity framework for writing assessment: Explanation and generalization inferences for the ECPE

Questionnaire validation practice: a protocol for a systematic descriptive literature review of health literacy assessments

Reference values for psychoacoustic tests on Polish school children 7-10 years old.

Students’ perceptions of assessment quality related to their learning approaches and learning outcomes

Ethics and Fairness in Assessing Learning Outcomes in Higher Education

Validating Test Score Interpretations Using Time Information.

Maximum effort may not be required for valid intelligence test score interpretations