Abstract

The measurement tool not measuring the specific construct has a validity problem. Individuals based on the results obtained from this type of tool should not be evaluated. The purpose of this study was to examine the differentiated item functioning and item bias of mathematics items in the Programme for International Student Achievement 2012 assessment for gender using two-level hierarchical generalized linear model, logistic regression and experts’ opinions. Also differentiated item functioning sources (anxiety, interest and self-efficacy) at student level were tested. The current study was created under take into account of quantitative and qualitative methods. It was conducted with 1458 students selected from 166 schools of Turkey sample. The results reveal that hierarchical generalized linear models approach is more conservative than logistic regression approach. When the student level variables were added to the model as potential sources, differentiated item functioning did not disappear for the three items. Also half of the experts argued that the items identified as in favor of boys are biased. Statements in the items and the context were given as the reasons for this bias.

Highlights

  • Program for International Student Achievement (PISA) is a large scale assessment study conducted worldwide by the Organization for Economic Cooperation and Development (OECD)

  • The findings revealed that the Items 4, 7 and 9 which were found to display differential item functioning (DIF) according to gender still displayed DIF when the three variables were added to the model

  • The present study aims to examine the construct validity of the Turkish PISA 2012 mathematics test

Read more

Summary

Introduction

Program for International Student Achievement (PISA) is a large scale assessment study conducted worldwide by the Organization for Economic Cooperation and Development (OECD). PISA results and reports are amongst the sources of reference that countries examine in order to organize their education policies. Not PISA’s aim, based on these results countries compare each other in terms of student performance and make critics. Such large-scale assessments, which are conducted by being translated into different languages, should be reliable and valid. If two students with the same achievement level do not perform the same in a mathematics question, what may be the reason for this? If two students with the same achievement level do not perform the same in a mathematics question, what may be the reason for this? Is there a problem with the validity of the test or is the performance affected by different variables?

Objectives
Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.