The Method for Comprehensive Quality Evaluation of Tests. Part 2

V M Kukharenko,L P Perkhun,N M Tovmachenko

doi:10.31767/su.4(83)2018.04.09

Abstract

In the article, the description of the complex evaluation method is given, as well as the classical method of Data Mining and Item Response Theory (IRT). In the general method there are six steps. This article describes steps 4-6. The fourth step of the method is to evaluate the reliability of the test. A universal two-step procedure is proposed – the assessment of the reliability of individual test tasks based on the coefficient of internal coherence of Kjuder – Richardson and the evaluation of the reliability of the test as a whole by the coefficient of generalization. The first of the coefficients is considered acceptable at the level of 0.7 and above, the second – at the level of 0.8 and above. Two-factor ANOVA variance analysis without repeated measurements in SPSS was used to calculate the second coefficient. At the fifth stage of the methodology, the quality of students' differentiation is assessed by a test that is being studied. The tool for this is selected hierarchical cluster procedures, classification trees and classification discriminant functions. The calculations were performed by means of Statistica and SPSS. Three clusters of students with high, medium and low academic performance were identified. It is shown that the test under study allows the differentiation of students. At the last, sixth stage, a study of the quality of the test is described based on the one-parameter model of Rash. The levels of the difficulty of the test assignment and the mastering of the student's study material are measured in logics. The analytical task of the characteristic individual curve of the test assignment and the characteristic individual curve of the student, as well as the auxiliary formulas for their calculations, are given. The description is illustrated by a specific example. It is noted that the characteristic curves of students based on the Rash model by means of MathCAD, can clearly divide the latter into two groups – strong (have positive logic) and weak (have negative logic). Recommendations on the interpretation of the obtained results for certain test tasks are formulated. In particular, in case of overlap of the characteristic curves of various test tasks, they must be deleted (normative-oriented test) or reconstructed (criterion-oriented test). This paper does not consider how to determine which test question is to be deleted or corrected, but it is indicated that this can be established with the help of a two-parameter Birnbaum model. If the density of the characteristic curves of the test tasks is not the same; It is recommended to add a test task (in the case of a normative-oriented test) or thus change the duplicate test questions (in the case of a normative-oriented test) to fill the gaps of the abscissa, where there are no characteristic curves. By the practical implementation of this technique, the authors determine the development of a separate plug-in that is compatible with the Moodle distance learning platform. The prospect of further research in the theoretical framework is determined by the authors of the study of the boundaries of the use of two-parameter and three-parameter models of Birnbaum to improve the process and test results of students in distance learning systems.

Highlights

Продовжено виклад методики комплексного оцінювання якості тестів, що ґрунтується на методах класичної теорії, методах Data Mining та Item Response Theory (IRT)
Занадто велике додатне значення свідчить про високу складність тесту або про низьку підготовку групи студентів
І.Адаптивні тести: статистичні методи аналізу результатів тестового контролю знань // Математичні машини і системи

Summary

СОЦІАЛЬНА СТАТИСТИКА

Продовжено виклад методики комплексного оцінювання якості тестів, що ґрунтується на методах класичної теорії, методах Data Mining та Item Response Theory (IRT). З одного боку, два набори тестових завдань мають бути орієновані на вимірювання або однієї властивості студента (нормативно орієнтоване тестування), або ступеня засвоєння одного й того самого набору знань (критеріально орієнтовані тести). Цей підхід ґрунтується на припущенні, що розподіл оцінок тестових завдань у досліджуваному наборі тестів описується нор-. Оцінювання надійності окремих завдань тесту за коефіцієнтом внутрішньої узгодженості. 2. Оцінювання надійності тесту загалом (усього набору оцінок) за коефіцієнтом генералізації [10; 13]: де – дисперсія оцінок студентів; – дисперсія складності завдання;. Для розрахунку коефіцієнта генералізації можуть бути використані результати двофакторного дисперсійного аналізу ANOVA без повторних вимірювань. Обчислений нами коефіцієнт свідчить про задовільну надійність окремих завдань тесту. Для розрахунків за коефіцієнтом генералізації наведемо результати двофакторного дисперсійного аналізу ANOVA без повторних вимірювань, виконаного у пакеті статистичних програм (ПСП) SPSS Для розрахунків за коефіцієнтом генералізації наведемо результати двофакторного дисперсійного аналізу ANOVA без повторних вимірювань, виконаного у пакеті статистичних програм (ПСП) SPSS (табл. 7, за даними матриці (1))

Джерело варіації

Належність тестованих до кластера ієрархічного кластерного аналізу

Для розрахунку вхідних даних для моделі

Номер тестованого θ