Abstract

The chapter discusses a discrepancy of test items by difficulty level between the test developers and students’ perceptions. Previous studies showed the difficulty level was critical in multiple choice question tests (Naqvi et al., Procedia—Social and Behavioral Sciences 2:3909–3913, 2010; Sim and Rasiah, Annals Academy of Medicine Singapore 35:67–71, 2006). A high number of invalid test items also reduced the effectiveness of a test (Ratnaningsih & Isfarudi, 2010). The aim of the study was to compare the difficulty level of the test items according to the test developers and the difficulty level based on item analysis. The hypothesis is that if there is a gap between two kinds of difficulty levels, the test is less effective. The study used data from three examination results of BIOL4110 (a General Biology test at Universitas Terbuka, Indonesia) of three consecutive semesters between 2014 and 2015. Participant numbers for of each examination were 469, 536, and 520 students. Analysis of a relationship between difficulty levels used Chi square test. In addition, there was an analysis of relevance of the test to the textbook using KR20 and an analysis of the discriminant index. The analysis showed that in each semester, there were always different difficulty levels between test developer judgment and item analysis results. In addition, the relevance level of the test was greater than 0.5, which was good, while the discriminant index was not good, since some test items had rpbis of <0.3. However the passing rate of each test (62–73%) was satisfactory.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call