Assessing code readability in Python programming courses using eye‐tracking

Milan Segedinac,Zora Konjović,Jelena Slivka,Ivana Zeljković,Goran Savić

doi:10.1002/cae.22685

Abstract

AbstractCode readability models are typically based on the code's structural and textual features, considering code readability as an objective category. However, readability is inherently subjective and dependent on the knowledge and experience of the reader analyzing the code. This paper assesses the readability of Python code statements commonly used in undergraduate programming courses. Our readability model is based on tracking the reader's eye movement during the while‐read phase. It uses machine learning (ML) techniques and relies on a novel set of features—observational features—that capture how the readers read the code. We experimented by tracking the eye movement of 90 undergraduate students while assessing the readability of 48 Python code snippets. We trained an ML model that predicts readability based on the collected observational data and the code snippet's structural and textual features. In our experiments, the XGBoost classifier trained using observational features exclusively achieved the best results (0.85 F‐measure). Using correlation analysis, we identified Python statements most affecting readability for undergraduate students and proposed implications for teaching Python programming. In line with findings for Java language, we found that constructs related to the code's size and complexity hurt the code's readability. Numerous comments also hindered readability, potentially due to their association with less readable code. Some Python‐specific statements (list comprehension, lambda function, and dictionary comprehension) harmed code readability, even though they were part of the curriculum. Tracking students' gaze indicated some additional factors, most notably nonlinearity introduced by if, for, while, try, and function call statements.

Full Text