Abstract

One of the most commonly used methods for measuring higher-order thinking skills, such as problem-solving and written expression is open-ended items. Three main approaches are used to evaluate responses to open-ended items: general evaluation, rating scale, and the rubric. In order to measure and improve the problem-solving skills of the students, firstly, an error-free measurement process should be performed. Error caused by rater is a common problem in the evaluation of open-ended items. Errors caused by the rater, such as bias, high or low tendency to score, adversely affect the accuracy of decisions to be made. In this study, the raters' tendencies are evaluated in terms of general evaluation, rating scale, and rubric conditions used to evaluate open-ended items. The rater behaviors in each assessment method and the raters' opinions about the assessment methods were determined. The participants of the study consisted of 12 different mathematics teachers, and the analyses were based on the Many Facet Rasch Model. The scoring reliability of each method was estimated. When using the rating scale, it was concluded that the raters had a more homogeneous scoring tendency. In addition, while the majority of raters stated that they prefer to use rubric, the most difficult method to use was stated by the raters.

Highlights

  • The quality of assessment, monitoring, and evaluation processes are directly related to the quality of the measurement tools used in these processes

  • The present study aimed to examine the change of rater tendencies according to general evaluation, rating scale, and rubric

  • It is aimed to examine the scoring tendencies of the raters depending on the assessment method used and to determine which assessment method they prefer and why

Read more

Summary

Introduction

The quality of assessment, monitoring, and evaluation processes are directly related to the quality of the measurement tools used in these processes. The quality of these tools, which entails the ability to measure as far as possible and without errors, is determined by the quality of the items. Different item types have been developed to measure learning at different cognitive levels during the education process (Çıkrıkçı, 2010). In classroom assessments, where multiple knowledge and skills are wanted to be measured at different cognitive levels, different item structures can be used together. Many largescale national and international evaluation studies such as National Assessment of Educational Progress (NEAP), Scholastic Aptitude Test (SAT), Trends in International Mathematics and Science Study (TIMMS), Programme for International Student Assessment (PISA) include both multiple-choice and open-ended items (DeCarlo, Kim, & Johnson, 2011; Kim, 2009; Mariano, 2002)

Objectives
Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.