Abstract
In recent years, the computerised adaptive test (CAT) has gained popularity over conventional exams in evaluating student capabilities with desired accuracy. However, the key limitation of CAT is that it requires a large pool of pre-calibrated questions. In the absence of such a pre-calibrated question bank, offline exams with uncalibrated questions have to be conducted. Many important large exams are offline, for example the Graduated Aptitude Test in Engineering (GATE) and Japanese University Entrance Examination (JUEE). In offline exams, marks are used as the indicator of the students’ capabilities. In this work, our key contribution is to question whether marks obtained are indeed a good measure of students’ capabilities. To this end, we propose an evaluation methodology that mimics the evaluation process of CAT. In our approach, based on the marks scored by students in various questions, we iteratively estimate question parameters such as difficulty, discrimination and the guessing factor as well as student parameters such as capability using the 3-parameter logistic ogive model. Our algorithm uses alternating maximisation to maximise the log likelihood estimate for the questions and students’ parameters given the marks. We compare our approach with marks-based evaluation using simulations. The simulation results show that our approach out performs marks-based evaluation.
Highlights
The multiple choice exams are the most popular assess ment scheme for large scale exams such as the compu terised adaptive test (CAT), Graduate Record Examinations (GRE), Scholastic Aptitude Test (SAT) and so on
In situations with “partial knowledge” the guessing factor becomes more significant, since even without knowing the correct answer to an item if a candidate is successful in eliminating a few distractors of the item with his/her partial information about the item, his/her chances of getting it correct is greater
The plot below shows a very narrow band indicating that for 90% of the exams over which this experiment is averaged, the number of false-positives vary in a narrow range
Summary
The multiple choice exams are the most popular assess ment scheme for large scale exams such as the compu terised adaptive test (CAT), Graduate Record Examinations (GRE), Scholastic Aptitude Test (SAT) and so on. In multiple-choice exams where the total score is considered as the measure of the candidate’s capability, the items that a candidate answered correctly does not play any role in deciding his/her capability. In such a situation, answering an easy question correctly and a very difficult question correctly fetches him/her the same credit that does not seem to be appropriate. In cases of large-scale offline exams such as GATE, students take tests in different test centres for the same discipline by answering different question papers and are ranked in a common rank list. Question papers are different for different batches and comparison of scores cannot be justified if total marks are used as the only deciding parameter
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.