Articles published on Program comprehension
Authors
Select Authors
Journals
Select Journals
Duration
Select Duration
545 Search results
Sort by Recency
- Research Article
- 10.1142/s0129183125420227
- Nov 29, 2025
- International Journal of Modern Physics C
- Wudao Yang + 3 more
Scanpath Analysis of Programming Comprehension: A MultiMatch and Clustering-Based Approach
- Research Article
- 10.1142/s0129183125420239
- Nov 29, 2025
- International Journal of Modern Physics C
- Wudao Yang + 2 more
Automated Token-Level AOIs Using OCR for Predicting Novice Programmers’ Visual Attention
- Research Article
- 10.1002/spe.70030
- Nov 11, 2025
- Software: Practice and Experience
- Khalid Mahmood + 5 more
ABSTRACT Objective Antipatterns (APs) represent potential issues in software systems stemming from poor design choices, coding practices, and undisciplined development. This systematic literature review analyzes 97 primary studies (PSs) from 2005 to 2024, exploring the impact of APs on Object‐Oriented (OO), Service‐Oriented (SO), and Mobile‐Oriented (MO) systems across various quality attributes. Methods PSs are classified by techniques, datasets, evaluation measures, and tool support. Result Findings highlight the association of APs with increased maintenance costs (27.8%), fault‐proneness (26.8%), change‐proneness (12.3%), and evolution challenges (25.7%). Most studies employ descriptive statistics, regression analysis, and Pearson correlation, with limited datasets and tool support for SO and MO systems compared to OO systems. Intermediate source code representations and program comprehension strategies are commonly used for analysis. Conclusion These findings emphasize the need for further research on the impact of APs, particularly in MO systems, and their negative effects on software quality attributes.
- Research Article
- 10.34293/pijcmr.v13i3.2025.003
- Nov 1, 2025
- Primax International Journal of Commerce and Management Research
- Ajitesh + 1 more
This study explores the role of Microfinance Institutions (MFIs) in supporting Micro, Small, and Medium Enterprises (MSMEs) in the Delhi NCR region, particularly in facilitating cross-border commerce. The research investigates the financial, capacity-building, logistical, and digital support provided by MFIs to enable MSMEs' participation in international trade. Through a survey of MSME owners and managers, the study reveals that MFIs play a crucial role in offering export-focused loans, providing capacity-building programs, and supporting e-commerce adoption. However, despite these contributions, the study identifies gaps in the accessibility of low-interest loans, the comprehensiveness of export training programs, and the digital literacy of MSMEs. The findings show that 35% of respondents strongly agree that MFIs provide adequate export-focused loans, yet concerns over high-interest rates persist. Furthermore, while training programs are generally appreciated, only 40% of respondents believe that export documentation training is sufficiently detailed. The study also highlights the significance of e-commerce platforms in enhancing MSMEs’ global visibility, with 60% of respondents strongly agreeing that these platforms help reach international markets. Nevertheless, digital literacy and access to technology remain barriers for many MSMEs. Logistical support provided by MFIs, such as partnerships with logistics providers, is also deemed beneficial in reducing costs and improving reliability for MSMEs engaged in international trade. The study concludes that while MFIs play an essential role in supporting MSMEs, there is a need for more affordable financial products, comprehensive export training, enhanced digital literacy programs, and stronger logistics partnerships. Policymakers and MFIs must collaborate to develop strategies that address these challenges and empower MSMEs to succeed in global markets. This research contributes to a deeper understanding of how MFIs can foster MSME participation in cross-border trade.
- Research Article
- 10.1145/3763087
- Oct 9, 2025
- Proceedings of the ACM on Programming Languages
- Florian Sihler + 1 more
The R programming language is primarily designed for statistical computing and mostly used by researchers without a background in computer science. R provides a wide range of dynamic features and peculiarities that are difficult to analyze statically like dynamic scoping and lazy evaluation with dynamic side effects. At the same time, the R ecosystem lacks sophisticated analysis tools that support researchers in understanding and improving their code. In this paper, we present a novel static dataflow analysis framework for the R programming language that is capable of handling the dynamic nature of R programs and produces the dataflow graph of given R programs. This graph can be essential in a range of analyses, including program slicing, which we implement as a proof of concept. The core analysis works as a stateful fold over a normalized version of the abstract syntax tree of the R program, which tracks (re-)definitions, values, function calls, side effects, external files, and a dynamic control flow to produce one dataflow graph per program. We evaluate the correctness of our analysis using output equivalence testing on a manually curated dataset of 779 sensible slicing points from executable real-world R scripts. Additionally, we use a set of systematic test cases based on the capabilities of the R language and the implementation of the R interpreter and measure the runtimes well as the memory consumption on a set of 4,230 real-world R scripts and 20,815 packages available on R’s package manager CRAN. Furthermore, we evaluate the recall of our program slicer, its accuracy using shrinking, and its improvement over the state of the art. We correctly analyze almost all programs in our equivalence test suite, preserving the identical output for 99.7% of the manually curated slicing points. On average, we require 576ms to analyze the dataflow and around 213kB to store the graph of a research script. This shows that our analysis is capable of analyzing real-world sources quickly and correctly. Our slicer achieves an average reduction of 84.8% of tokens indicating its potential to improve program comprehension.
- Research Article
- 10.1145/3763054
- Oct 9, 2025
- Proceedings of the ACM on Programming Languages
- Michael Schröder + 1 more
Parsing—the process of structuring a linear representation according to a given grammar—is a fundamental activity in software engineering. While formal language theory has provided theoretical foundations for parsing, the most common kind of parsers used in practice are written ad hoc. They use common string operations without explicitly defining an input grammar. These ad hoc parsers are often intertwined with application logic and can result in subtle semantic bugs. Grammars, which are complete formal descriptions of input languages, can enhance program comprehension, facilitate testing and debugging, and provide formal guarantees for parsing code. But writing grammars—e.g., in the form of regular expressions—can be tedious and error-prone. Inspired by the success of type inference in programming languages, we propose a general approach for static inference of regular input string grammars from unannotated ad hoc parser source code. We use refinement type inference to synthesize logical and string constraints that represent regular parsing operations, which we then interpret with an abstract semantics into regular expressions. Our contributions include a core calculus λ Σ for representing ad hoc parsers, a formulation of (regular) grammar inference as refinement inference, an abstract interpretation framework for solving string refinement variables, and a set of abstract domains for efficiently representing the constraints encountered during regular ad hoc parsing. We implement our approach in the PANINI system and evaluate its efficacy on a benchmark of 204 Python ad hoc parsers. Compared with state-of-the-art approaches, PANINI produces better grammars (100% precision, 93% average recall) in less time (0.82 ± 2.85 s) without prior knowledge of the input space.
- Research Article
- 10.1145/3720540
- Oct 3, 2025
- ACM Transactions on Software Engineering and Methodology
- Zongwen Shen + 6 more
In recent years, pre-trained language models have seen significant success in natural language processing and have been increasingly applied to code-related tasks. Code intelligence tasks have shown promising performance with the support of code pre-trained language models. Pre-processing code simplification methods have been introduced to prune code tokens from the model’s input while maintaining task effectiveness. These methods improve the efficiency of code intelligence tasks while reducing computational costs. Post-prediction code simplification methods provide explanations for code intelligence task outcomes, enhancing the reliability and interpretability of model predictions. However, comprehensive evaluations of these methods across diverse code pre-trained model architectures and code intelligence tasks are lacking. To assess the effectiveness of code simplification methods, we conduct an empirical study integrating these code simplification methods with various pre-trained code models across multiple code intelligence tasks. Our empirical findings suggest that developing task-specific code simplification methods would be beneficial. Then, we recommend leveraging post-prediction methods to summarize prior knowledge, which can pre-process code simplification strategies. Moreover, establishing more evaluation mechanisms for code simplification is crucial. Finally, we propose incorporating code simplification methods into the pre-training phase of code pre-trained models to enhance their program comprehension and code representation capabilities.
- Research Article
- 10.1186/s12875-025-03021-7
- Sep 29, 2025
- BMC primary care
- Sina Etemadi + 3 more
The primary health care system is acknowledged as the essential entry point to health services. In Iran, primary health care has historically been a key focus for policymakers. However, the accreditation of these services has only recently gained attention as a significant consideration. This study aims to qualitatively examine the prerequisites necessary for the effective implementation of accreditation programs within the primary health care system. This qualitative study was conducted using semi-structured interviews with a diverse group of participants, including specialists from the Iran Ministry of Health, managers of comprehensive health centers in Kerman, physicians, and representatives from the Deputy of Health at Tehran, Kerman, and Mashhad Universities of Medical Sciences. Purposive sampling was utilized through a snowball approach. Content analysis and MAXQDA12 software were used for data analyzing. The results showed that various factors are prerequisites for the accreditation program. These requirements were subthemes of the three major concepts introduced by Avedis Donabedian, i.e., structure, process, and outcome. Structural challenges encompassed programs, culture, accreditation platforms, evaluation teams, and motivation. Process challenges included program comprehensiveness, financial resource sustainability, implementation leveling, knowledge translation, implementation protocols, comprehensive training, accreditation standards, and system design. On the basis of the Donabedian model, the results section includes the outcome and expected output. With respect to the challenges of the accreditation program, most of the issues raised by the participants were related to the fundamental and structural defects of the country's healthcare system. The challenges faced in the accreditation program are largely rooted in the fundamental and structural defects of the country's healthcare system. The prerequisites for effective accreditation are not limited to the process itself; rather, they are heavily influenced by broader systemic issues related to the program, culture, resources, and overall design of the healthcare infrastructure. Addressing these underlying structural problems is crucial for the successful implementation and sustainability of the accreditation program. In any case, without considering major challenges, the implementation of the accreditation program could face serious problems.
- Research Article
- 10.58578/tsaqofah.v5i6.7139
- Aug 12, 2025
- TSAQOFAH
- Rifqah Nuha Nabiilah + 1 more
Algorithms and Programming play an important role in developing computational thinking skills; however, students’ understanding of this subject, including at MTsN 2 Solok, remains relatively low. This study aims to analyze the effect of applying the Game-Based Learning model using the CodeCombat platform on improving students’ programming comprehension. The research employed a quantitative quasi-experimental design, involving two classes—VIII A and VIII B—selected through purposive sampling, each consisting of 32 students as the experimental and control groups. Data were collected through validated multiple-choice pre-tests and post-tests, then analyzed using the paired sample t-test and independent sample t-test after meeting normality and homogeneity assumptions. The results show a significant improvement in programming comprehension in the experimental class after game-based learning with CodeCombat, with an average post-test score of 80.62 compared to 62.63 in the control class. It is concluded that the application of Game-Based Learning with CodeCombat is effective in enhancing students’ programming comprehension. The implications of this research include enriching the literature on game-based learning in informatics education and providing recommendations for educators to utilize interactive media in programming instruction, while also opening opportunities for further studies on the effects of duration and intensity of educational game use on learning outcomes.
- Research Article
- 10.1007/s10664-025-10699-x
- Jul 24, 2025
- Empirical Software Engineering
- Christian D Newman + 10 more
Abstract Identifier names are crucial components of code, serving as primary clues for developers to understand program behavior. This paper investigates the linguistic structure of identifier names by extending the concept of grammar patterns, which represent the part-of-speech (PoS) sequences underlying identifier phrases. The specific focus is on closed syntactic categories (e.g., prepositions, conjunctions, determiners), which are rarely studied in software engineering despite their central role in general natural language. To study these categories, the Closed Category Identifier Dataset (CCID), a new manually annotated dataset of 1,275 identifiers drawn from 30 open-source systems, is constructed and presented. The relationship between closed-category grammar patterns and program behavior is then analyzed using grounded-theory-inspired coding, statistical, and pattern analysis. The results reveal recurring structures that developers use to express concepts such as control flow, data transformation, temporal reasoning, and other behavioral roles through naming. This work contributes an empirical foundation for understanding how linguistic resources encode behavior in identifier names and supports new directions for research in naming, program comprehension, and education.
- Research Article
- 10.1145/3729400
- Jun 19, 2025
- Proceedings of the ACM on Software Engineering
- Yuvraj Virk + 2 more
A brief, fluent, and relevant summary can be helpful during program comprehension; however, such a summary does require significant human effort to produce. Often, good summaries are unavailable in software projects, which makes maintenance more difficult. There has been a considerable body of research into automated AI-based methods, using Large Language models (LLMs), to generate summaries of code; there also has been quite a bit of work on ways to measure the performance of such summarization methods, with special attention paid to how closely these AI-generated summaries resemble a summary a human might have produced. Measures such as BERTScore and BLEU have been suggested and evaluated with human-subject studies. However, LLM-generated summaries can be inaccurate, incomplete, etc: generally, too dissimilar to one that a good developer might write. Given an LLM-generated code summary, how can a user rationally judge if a summary is sufficiently good and reliable? Given just some input source code, and an LLM-generated summary, existing approaches can help judge brevity, fluency and relevance of the summary; however, it’s difficult to gauge whether an LLM-generated summary sufficiently resembles what a human might produce, without a “golden” human-produced summary to compare against. Prior research indicates that human-produced summaries are generally preferred by human-raters, so we explore this issue in this paper. We study this resemblance question as a calibration problem: given just the code & the summary from an LLM, can we compute a confidence measure, that provides a reliable indication of whether the summary sufficiently resembles what a human would have produced in this situation? We examine this question using several LLMs, for several languages, and in several different settings. Our investigation suggests approaches to provide reliable predictions of the likelihood that an LLM-generated summary would sufficiently resemble a summary a human might write for the same code.
- Research Article
- 10.1145/3744739
- Jun 17, 2025
- ACM Transactions on Software Engineering and Methodology
- Annabelle Bergum + 4 more
Background: Neuroimaging methods have been proved insightful in program-comprehension research. A key problem is that different baselines have been used in different experiments. A baseline is a task during which the “normal” brain activation is captured as a reference compared to the task of interest. Unfortunately, the influence of the choice of the baseline is still unclear. Aims: We investigate whether and to what extent the selected baseline influences the results of neuroimaging experiments on program comprehension. This helps to understand the trade-offs in baseline selection with the ultimate goal of making the baseline selection informed and transparent. Method: We have conducted a pre-registered program-comprehension study with 20 participants using multiple baselines (i.e., reading, calculations, problem solving, and cross-fixation). We monitored brain activation with a 64-channel electroencephalography (EEG) device. We compared how the different baselines affect the results regarding brain activation of program comprehension. Results and Implications: We found significant differences in mental load across baselines suggesting that selecting a suitable baseline is critical. Our results show that a standard problem-solving task, operationalized by the Raven-Progressive Matrices, is a well-suited default baseline for program-comprehension studies. Our results highlight the need for carefully designing and selecting a baseline in program-comprehension studies.
- Research Article
- 10.1145/3729482
- Jun 9, 2025
- Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies
- Liwei Liu + 5 more
IoT device components---digital representations of IoT devices within a platform and typically developed using Software Development Kits (SDKs)---are essential for ensuring seamless connectivity between IoT platforms and physical devices. However, developing these components demands extensive domain knowledge, as developers must understand the necessary elements of an IoT device and effectively utilize SDKs. Unfortunately, limited research has focused on automating this process, resulting in labor-intensive, time-consuming development. To tackle these challenges, we introduce LEGO, a method for synthesizing IoT device components based on the observation that APIs provided by device SDKs would eventually call network protocol methods to access physical devices. LEGO analyzes the SDK source code to identify candidate APIs that communicate with physical devices. Using static analysis, it generates a dataflow-enhanced call graph, extracts call paths containing network protocol methods, and heuristically identifies APIs that invoke these methods. To efficiently classify each API type and infer relevant device properties, LEGO employs a large language model-based program comprehension technique with an information-augmented prompt. LEGO then synthesizes device components using a platform-specific template, built from a common IoT device component model. It assembles IoT device components by populating the template with inferred properties and identified APIs, enabling developers to efficiently develop device components with minimal SDK knowledge. Comprehensive experiments on a set of open-source device SDKs and ten real-world IoT devices demonstrate the efficiency and effectiveness of LEGO in creating IoT device components.
- Research Article
- 10.20527/jmscedu.v5i1.15131
- Jun 8, 2025
- Journal of Mathematics Science and Computer Education
- Indra Maulana + 2 more
This study examines the impact of the SoloLearn online learning platform on students' motivation and understanding of programming in a high school context. The research instruments included a programming comprehension test (pre-test and post-test), a Likert-scale-based learning motivation questionnaire, and an observation sheet for student engagement. All instruments were validated by subject matter experts and tested for reliability, with the motivation questionnaire yielding a Cronbach's Alpha value of 0.927, indicating high internal consistency. The study employed a quasi-experimental design with a pretest-posttest approach involving experimental and control groups. The participants were 11th-grade students selected through purposive sampling. The findings revealed that the experimental group, which used SoloLearn, demonstrated significantly greater motivation and comprehension improvements than the control group. An independent samples t-test indicated a statistically significant difference in post-test scores between the groups, with a p-value of 0.000 and a mean difference of 23.78. These results suggest that SoloLearn is an effective educational tool for enhancing programming instruction and aligns well with the principles of the Merdeka Curriculum.
- Research Article
- 10.1007/s44443-025-00075-6
- Jun 1, 2025
- Journal of King Saud University Computer and Information Sciences
- Yi Rong + 4 more
EduFuncSum: a function-wise progressive transformer for code summarization in undergraduate programming education
- Research Article
- 10.5753/jserd.2025.4803
- May 28, 2025
- Journal of Software Engineering Research and Development
- Djan Santos + 2 more
Background: #ifdefs allow developers to define source code related to features that should or should not be compiled. A feature dependency occurs in a configurable system when source code snippets of different features share code elements, such as variables. Variables that produce feature dependency are called dependent variables. The dependency between two features may include just one dependent variable or more than one. It is reasonable to suspect that a high number of dependent variables and their use make the analysis of variability scenarios more complex. In fact, previous studies show that #ifdefs may affect comprehensibility, especially when their use implies feature dependency. Aims: In this sense, our goal is to understand how feature dependent variables affect the comprehensibility of configurable system source code. We conducted two complementary empirical studies. In Study 1, we evaluate if the comprehensibility of configurable system source code varies according to the number of dependent variables. Testing this hypothesis is important so that we can recommend practitioners and researchers the extent to which writing #ifdef code with dependencies is harmful. In study 2, we carried out an experiment in which developers analyzed programs with different degrees of variability. Our results show that the degree of variability did not affect the comprehensibility of programs with feature dependent variables. Method: We executed a controlled experiment with 12 participants who analyzed programs trying to specify their output. We quantified comprehensibility using metrics based on time and attempts to answer tasks correctly, participants’ visual effort, and participants’ heart rate. Results: Our results indicate that the higher the number of dependent variables the more difficult it was to understand programs with feature dependency. Conclusions: In practice, our results indicate that comprehensibility is more negatively affected in programs with higher number of dependent variables and when these variables are defined at a point far from the points where they are used.
- Research Article
- 10.1142/s0218194025300015
- May 27, 2025
- International Journal of Software Engineering and Knowledge Engineering
- Rakan Alanazi
In the evolving landscape of software development, where maintaining and understanding complex systems is increasingly challenging, call graph techniques play a critical role in enhancing software comprehension by providing a visual and structural representation of function calls within a system. This paper explores the role of call graphs in simplifying software maintenance and debugging. It highlights how call graphs significantly improve developers’ understanding of system architectures and function interactions, reducing the time spent on manual code exploration. Furthermore, the paper explores recent advancements in call graph techniques, particularly the integration of machine learning and deep learning models with traditional call graph approaches. This hybrid methodology demonstrates enhanced accuracy and relevance in tasks such as program comprehension and code refactoring, making it a valuable tool for modern software engineering practices.
- Research Article
- 10.1093/iwc/iwaf028
- May 27, 2025
- Interacting with Computers
- Nico Ritschel + 4 more
Abstract End-user programmers need programming tools that are easy to learn and use. Development environments for end-users often support one of two visual modalities: block-based programming or data-flow programming. In this work, we discuss differences in how these modalities represent programs, and why existing block-based programming tools are better suited for imperative tasks while data-flow programming better supports nested expressions. We focus on robot programming as an end-user scenario that requires both imperative and expressions-based code in the same program. To study how end-user tools can better support this scenario, we propose two programming system designs: one that changes how blocks represent nested expressions, and one that combines block-based and data-flow programming in the same hybrid environment. We compared these designs in a controlled experiment with 113 end-user participants who solved programming and program comprehension tasks using one of the two environments. Both groups indicated a small preference for the hybrid system in direct comparison, but participants who used blocks to solve tasks performed better on average than hybrid system users and gave higher usability ratings. These findings suggest that despite the appeal of data-flow programming, a well-adapted block-based programming interface can lead end-users to more programming success.
- Research Article
- 10.1145/3722229
- May 20, 2025
- ACM Transactions on Computing Education
- Eman Abdullah Alomar
Large Language Models (LLMs), such as ChatGPT, have become widely popular for various software engineering tasks, including programming, testing, code review, and program comprehension. However, their impact on improving software quality in educational settings remains uncertain. This article explores our experience teaching the use of Programming Mistake Detector (PMD) to foster a culture of bug fixing and leverage LLM to improve software quality in the classroom. This article discusses the results of an experiment involving 155 submissions that carried out a code review activity of 1,658 rules. Our quantitative and qualitative analyses reveal that a set of PMD quality issues influences the acceptance or rejection of the issues, and design-related categories that take longer to resolve. Although students acknowledge the potential of using ChatGPT during code review, some skepticism persists. Further, constructing prompts for ChatGPT that possess clarity, complexity, and context nurtures vital learning outcomes, such as enhanced critical thinking, and among the 1,658 issues analyzed, 93% of students indicated that ChatGPT did not identify any additional issues beyond those detected by PMD. Conversations between students and ChatGPT encompass five categories, including ChatGPT’s use of affirmation phrases like “certainly” regarding bug fixing decisions, and apology phrases such as “apologize” when resolving challenges. Through this experiment, we demonstrate that code review can become an integral part of the educational computing curriculum. We envision our findings to enable educators to support students with effective code review strategies, increasing awareness of LLMs, and promoting software quality in education.
- Research Article
- 10.1186/s43238-025-00196-x
- May 19, 2025
- Built Heritage
- Wanling Jian
The crucial role of local communities in heritage management has received increasing international attention over the past few decades, exemplified by the Five Key Strategies, the Historic Urban Landscape Approach, and the World Heritage Capacity Building Strategy. At the World Heritage 40th anniversary, the importance of local community was further highlighted in achieving sustainable development. The historic town of Vigan was awarded the ‘Best Practice in World Heritage Site Management’ award. The homeowner manual, which was codeveloped by UNESCO and Vigan partners, was developed as a capacity-building tool to educate the local community on conservation techniques. It aims to shape individuals into responsible and capable heritage custodians. However, an accredited evaluation report noted that the effectiveness of global capacity-building initiatives in practice is questionable and might not fully fulfil the target audience’s needs. Using an ethnographic approach to explore community members’ interactions with the homeowner manual, this article reveals the complex process undertaken by local communities in Vigan to develop their capacity for built heritage conservation. The presence of preexisting capacity and varied levels of social capital influence homeowners’ resources available for conservation projects. The difficulties encountered by local communities prompted discussions on the comprehensiveness of capacity-building programs and their adaptability to social dynamics. The findings and lessons learned provide guidelines for future capacity-building program designs targeting local communities at both World Heritage sites and places of cultural significance.