Abstract

Programming contests such as the International Olympiad for Informatics (IOI) and the International Collegiate Programming Contest (ICPC) are effective for encouraging young and bright programmers. These contests require contestants to complete a few tasks (between three and nine) related to algorithmic problems within a limited time. For this study, we collected a set of 2,400 programming codes submitted to the KOI (Korea Olympiad for Informatics) in 2011 and 2012 as well as 2,300 programming codes submitted at the preliminary contest session for the ICPC in 2009, 2011, and 2012 at the East-Asia regional contest. Because submitted source codes were evaluated with blind test cases, we can define a criteria to separate the high- and low-scoring students in the order of their respective scores. The main objective of this paper is to reveal the relationship between the task's proposed features, its difficulty, the school grade (elementary, middle-, and high-school), and the score. We do so with the data-mining tool WEKA. The ultimate goal of this study is to predict the score of some particular code with static analysis. We propose a simple and straightforward complexity measure based on the block-tree structure. We considered the high scoring student group as a positive class and the low scoring student group as negative class. The performance of the data mining classifier named Naive Bayes are evaluated based on 10-fold cross validation test. We decided that the meaningful classification for a harmonic mean of sensitivity and specificity is empirically larger than 0.6 empirically. Among the codes acquired through the KOI, we found a set of outlier codes that attempt to reply with the correct response to receive extra points. Among the codes acquired through the ICPC, we discovered that good collegiate programmers (i.e., Those with high score) attempt to keep their code more compact, both lexically and structurally. We used WEKA to analyze the code using code-features proposed in this study, and the results are detailed quantitatively.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call