In biology learning, test instruments are essential for assessing students' understanding of complex concepts. A test instrument is a crucial factor in learning evaluation; however, its implementation remains minimal. This descriptive quantitative study aims to analyze the quality of test items using the classical approach in terms of validity, reliability, difficulty index, discrimination power, distractor effectiveness, and the Rasch model analysis. The data consists of 30 multiple-choice questions from a biology midterm exam administered to 40 students. Classical test data analysis uses Microsoft Excel, while Rasch model analysis uses Winsteps software. The validity test results from both approaches show 14 valid questions and 16 invalid ones. The reliability scores are 0.619 (adequate) for the classical approach's Cronbach's Alpha, 0.85 (good) for the Rasch model, and 0.65 (weak) for personal reliability. The classical test theory and the Rasch model categorize item difficulty into four levels. The classical approach produces five categories for item discrimination, while the Rasch model identifies three groups based on the item separation index (H=3.45) and two groups based on respondent ability (H=1.96). Distractor effectiveness shows 93.3% functional distractors in the classical test and 80% in the Rasch model. The Rasch model offers greater precision in measuring student ability and detecting bias. Both models should be integrated for comprehensive item analysis. Future tests should focus on improving invalid items and the quality of distractors.
Read full abstract