Detecting Plagiarism as Out-of-distribution Samples for Large-scale Programming Contests

Runfan Wu,Aohui Lv,Qiyang Zhao

doi:10.15388/ioi.2022.08

Abstract

In competitive programming, standard solutions for easy tasks are usually simple and shorter, making submissions more convergent both in idea and texts. The huge difference in submission diversity between easy and hard tasks, brings inescapable challenges to plagiarism judging by means of similarity thresholding. In this paper, by drawing the strong data support from the China National Olympiads in Informatics (NOI), we study the statistical characteristics of submission similarities for tasks of wide range of difficulty degrees. Finally, we propose a new adaptive method to detect submission plagiarism as out-of-distribution samples, together with a large-scale challenge dataset of competitive programming submission plagiarism detection. Our method is shown to be of higher accuracy and robustness, thus feasible and reliable for largescale competitive programming contests.

Full Text