Dataset of Program Source Codes Solving Unique Programming Exercises Generated by Digital Teaching Assistant

Liliya A Demidova,Peter N Sovietov,Artyom V Gorchakov,Elena G Andrianova

doi:10.3390/data8060109

Liliya A Demidova, Peter N Sovietov + Show 2 more

Open Access

PDF Available

https://doi.org/10.3390/data8060109

Copy DOI

Export

Save

Cite

Journal: Data	Publication Date: Jun 14, 2023
Citations: 6	License type: CC BY 4.0

Affiliation: MIREA - Russian Technological University

Abstract
Full-Text PDF
Similar Papers

Abstract

Listen

This paper presents a dataset containing automatically collected source codes solving unique programming exercises of different types. The programming exercises were automatically generated by the Digital Teaching Assistant (DTA) system that automates a massive Python programming course at MIREA—Russian Technological University (RTU MIREA). Source codes of the small programs grouped by the type of the solved task can be used for benchmarking source code classification and clustering algorithms. Moreover, the data can be used for training intelligent program synthesizers or benchmarking mutation testing frameworks, and more applications are yet to be discovered. We describe the architecture of the DTA system, aiming to provide detailed insight regarding how and why the dataset was collected. In addition, we describe the algorithms responsible for source code analysis in the DTA system. These algorithms use vector representations of programs based on Markov chains, compute pairwise Jensen–Shannon divergences of programs, and apply hierarchical clustering algorithms in order to automatically discover high-level concepts used by students while solving unique tasks. The proposed approach can be incorporated into massive programming courses when there is a need to identify approaches implemented by students.

Full Text