STEP: A Scalable Testing and Evaluation Platform

Maria Christoforaki,Panagiotis Ipeirotis

doi:10.1609/hcomp.v2i1.13159

Abstract

The emergence of online crowdsourcing sites, online work platforms, and evenMassive Open Online Courses (MOOCs), has created an increasing need for reliably evaluating the skills of the participating users in a scalable way.Many platforms already allow users to take online tests and verify their skills, but the existing approaches face many problems. First of all, cheating is very common in online testing without supervision, as the test questions often "leak" and become easily available online together with the answers.Second, technical skills, such as programming, require the tests to be frequently updated in order to reflect the current state-of-the-art. Third,there is very limited evaluation of the tests themselves, and how effectively they measure the skill that the users are tested for. In this paper, we present a Scalable Testing and Evaluation Platform (STEP),that allows continuous generation and evaluation of test questions. STEP leverages already available content, on Question Answering sites such as StackOverflow and re-purposes these questions to generate tests. The system utilizes a crowdsourcing component for the editing of the questions, while it uses automated techniques for identifying promising QA threads that can be successfully re-purposed for testing. This continuous question generation decreases the impact of cheating and also creates questions that are closer to the real problems that the skill holder is expected to solve in real life.STEP also leverages the use of Item Response Theory to evaluate the quality of the questions. We also use external signals about the quality of the workers.These identify the questions that have the strongest predictive ability in distinguishing workers that have the potential to succeed in the online job marketplaces. Existing approaches contrast in using only internal consistency metrics to evaluate the questions. Finally, our system employs an automatic "leakage detector" that queries the Internet to identify leaked versions of our questions. We then mark these questions as "practice only," effectively removing them from the pool of questions used for evaluation. Our experimental evaluation shows that our system generates questions of comparable or higher quality compared to existing tests, with a cost of approximately 3-5 dollars per question, which is lower than the cost of licensing questions from existing test banks.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

STEP: A Scalable Testing and Evaluation Platform

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Human Computation and Crowdsourcing

Lead the way for us

Journal: Proceedings of the AAAI Conference on Human Computation and Crowdsourcing	Publication Date: Sep 5, 2014
Citations: 4

Similar Papers

A system for scalable and reliable technical-skill testing in online labor markets
Maria Christoforaki ... Panagiotis G Ipeirotis
Computer Networks | VOL. 90
Maria Christoforaki, et. al.Maria Christoforaki ... Panagiotis G Ipeirotis
11 Jul 2015
Computer Networks | VOL. 90

Challenges and Solutions of Online Language Teaching and Assessment During Covid-19
Nur Rasyidah Mohd Nordin ... Iliya Nurul Iman Mohd Ridzuan
World Journal of English Language | VOL. 12
Nur Rasyidah Mohd Nordin, et. al.Nur Rasyidah Mohd Nordin ... Iliya Nurul Iman Mohd Ridzuan
09 Nov 2022
World Journal of English Language | VOL. 12

APPLICATION OF COMPUTERIZED ADAPTIVE TESTING TO EDUCATIONAL PROBLEMS
David J Weiss ... G Gage Kingsbury
Journal of Educational Measurement | VOL. 21
David J Weiss, et. al.David J Weiss ... G Gage Kingsbury
01 Dec 1984
Journal of Educational Measurement | VOL. 21

The Characteristics of High-Risk Tryout Test Items for Indonesian Elementary Schools Students
Retno Widyaningrum ...
Universal Journal of Educational Research | VOL. 8
Retno Widyaningrum, et. al.Retno Widyaningrum ...
01 Jun 2020
Universal Journal of Educational Research | VOL. 8

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

STEP: A Scalable Testing and Evaluation Platform

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Human Computation and Crowdsourcing