Constructed-response Items Research Articles

Constructed response items that require the student to give more detailed and elaborate responses are widely applied in large-scale assessments. However, the hand-craft scoring with a rubric for massive responses is labor-intensive and impractical due to rater subjectivity and answer variability. The automatic response coding method, such as the automatic scoring of short answers, has become a critical component of the learning and assessment system. In this paper, we propose an interactive coding system called ASSIST to efficiently score student responses with expert knowledge and then generate an automatic score classifier. First, the ungraded responses are clustered to generate specific codes, representative responses, and indicator words. The constraint set based on feedback from experts is taken as training data in metric learning to compensate for machine bias. Meanwhile, the classifier from responses to code is trained according to the clustering results. Second, the experts review each coded cluster with the representative responses and indicator words to score a rating. The coded cluster and score pairs will be validated to ensure inter-rater reliability. Finally, the classifier is available for scoring a new response with out-of-distribution detection, which is based on the similarity between response representation and class proxy, i.e., the weight of class in the last linear layer of the classifier. The originality of the system developed stems from the interactive response clustering procedure, which involves expert feedback and an adaptive automatic classifier that can identify new response classes. The proposed system is evaluated on our real-world assessment dataset. The results of the experiments demonstrate the effectiveness of the proposed system in saving human effort and improving scoring performance. The average improvements in clustering quality and scoring accuracy are 14.48% and 18.94%, respectively. Additionally, we reported the inter-rater reliability, out-of-distribution rate, and cluster statistics, before and after interaction.

With the introduction of innovative items in the computer-based NAEA, the item format became diversified, and the current method of selecting anchor items from multiple-choice (MC) items became burdensome for test developers. The purpose of this study is to examine whether a mixed-format anchor items including both MC and constructed-response (CR) items could hold the stability of equating in the NAEA, and give desirable direction to expand the range of anchor items and to make test construct flexible. A simulation study was conducted based on a common-item non-equivalent group design with the test conditions of the Korean subject in the NAEA. The treatment conditions include (1) a type of anchor item set (a single MC or a mixed), (2) a composition type of mixed anchor item (MC:CR = 1:1, 3:2, or 3:4), (3) a length of mixed anchor items within a test (25%, 22%, or 19%), and (4) the presence of multidimensionality according to the item format were assumed. The main result is that there was no significant difference in the equating error between the single MC and the mixed anchor item condition. Specifically the equating error of the single MC was smaller than that of the mixed-format cases where the ratios between MC and CR were 1:1 and 3:2. In the 3:4-ratio mixed condition, the equating error was smaller than that of the single MC condition. A length of mixed anchor items within a test showed somewhat mixed results, probably because there was not much variation between the conditions in the composition ratio, about 20% in test. When there was multidimensionality according to the item format in the test, the equating error increased significantly with mixed anchor items compared to that of unidimensional test condition. In sum, about 20% of the mixed anchor items with slightly higher CR ratio was found to be suitable in the unidimensional test and caution was called for in using mixed anchor items in multidimensional test.

Constructed-response Items Research Articles

Related Topics

Articles published on Constructed-response Items

Learning to Score: A Coding System for Constructed Response Items via Interactive Clustering

Automated Scoring of Constructed Response Items in Math Assessment Using Large Language Models

Impact of Violating Unidimensionality on Rasch Calibration for Mixed-Format Tests

New Tests of Rater Drift in Trend Scoring

Resolving and Re-Scoring Constructed Response Items in Mixed-Format Assessments: An Exploration of Three Approaches

Combining machine translation and automated scoring in international large-scale assessments

Analysis of Mixed-Format Assessments Using Measurement Models and Topic Modeling

Transformation Model of History Learning in Increasing Student Competency

랜덤집단 설계 IRT 검사 동등화의 표준오차 추정을 위한 델타 방법의 기능 진단

A Comparison of Latent Semantic Analysis and Latent Dirichlet Allocation in Educational Measurement

Exploring examinees' responses to constructed response items with a supervised topic model.

Historically restricted or historically empowered? Differences in access to historical content knowledge between low‐ and high‐SES pupils

Sequential Bayesian Ability Estimation Applied to Mixed-Format Item Tests.

Causes of the Shortage of Physics Teachers in Croatia

English Learners and Constructed-Response Science Test Items Challenges and Opportunities

Regularized Mislevy-Wu Model for Handling Nonignorable Missing Item Responses

혼합형 가교문항 구성 방법이 혼합형 검사 동등화에 미치는 영향

Diagnostic Tree Model을 활용한 수학 서술형 문항 인지진단 평가 적용 연구

Examining how using dichotomous and partial credit scoring models influence sixth‐grade mathematical problem‐solving assessment outcomes

Epistemic knowledge – a vital part of scientific literacy?

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Constructed-response Items Research Articles

Related Topics

Articles published on Constructed-response Items

Learning to Score: A Coding System for Constructed Response Items via Interactive Clustering

Automated Scoring of Constructed Response Items in Math Assessment Using Large Language Models

Impact of Violating Unidimensionality on Rasch Calibration for Mixed-Format Tests

New Tests of Rater Drift in Trend Scoring

Resolving and Re-Scoring Constructed Response Items in Mixed-Format Assessments: An Exploration of Three Approaches

Combining machine translation and automated scoring in international large-scale assessments

Analysis of Mixed-Format Assessments Using Measurement Models and Topic Modeling

Transformation Model of History Learning in Increasing Student Competency

랜덤집단 설계 IRT 검사 동등화의 표준오차 추정을 위한 델타 방법의 기능 진단

A Comparison of Latent Semantic Analysis and Latent Dirichlet Allocation in Educational Measurement

Exploring examinees' responses to constructed response items with a supervised topic model.

Historically restricted or historically empowered? Differences in access to historical content knowledge between low‐ and high‐SES pupils

Sequential Bayesian Ability Estimation Applied to Mixed-Format Item Tests.

Causes of the Shortage of Physics Teachers in Croatia

English Learners and Constructed-Response Science Test Items Challenges and Opportunities

Regularized Mislevy-Wu Model for Handling Nonignorable Missing Item Responses

혼합형 가교문항 구성 방법이 혼합형 검사 동등화에 미치는 영향

Diagnostic Tree Model을 활용한 수학 서술형 문항 인지진단 평가 적용 연구

Examining how using dichotomous and partial credit scoring models influence sixth‐grade mathematical problem‐solving assessment outcomes

Epistemic knowledge – a vital part of scientific literacy?