Prompting Techniques for Secure Code Generation: A Systematic Investigation

  • Abstract
  • Literature Map
  • References
  • Citations
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

Large Language Models (LLMs) are gaining momentum in software development with prompt-driven programming enabling developers to create code from Natural Language (NL) instructions. However, studies have questioned their ability to produce secure code and, thereby, the quality of prompt-generated software. Alongside, various prompting techniques that carefully tailor prompts have emerged to elicit optimal responses from LLMs. Still, the interplay between such prompting strategies and secure code generation remains under-explored and calls for further investigations. Objective : In this study, we investigate the impact of different prompting techniques on the security of code generated from NL instructions by LLMs. Method : First, we perform a systematic literature review to identify the existing prompting techniques that can be used for code generation tasks. A subset of these techniques are evaluated on GPT-3, GPT-3.5, and GPT-4 models for secure code generation. For this, we used an existing dataset consisting of 150 NL security-relevant code generation prompts. Results : Our work (i) classifies potential prompting techniques for code generation (ii) adapts and evaluates a subset of the identified techniques for secure code generation tasks, and (iii) observes a reduction in security weaknesses across the tested LLMs, especially after using an existing technique called Recursive Criticism and Improvement (RCI), contributing valuable insights to the ongoing discourse on LLM-generated code security.

ReferencesShowing 10 of 89 papers
  • Open Access Icon
  • Cite Count Icon 100
  • 10.1145/3510003.3510203
Jigsaw
  • May 21, 2022
  • Naman Jain + 6 more

  • Open Access Icon
  • Cite Count Icon 2535
  • 10.1145/2601248.2601268
Guidelines for snowballing in systematic literature studies and a replication in software engineering
  • May 13, 2014
  • Claes Wohlin

  • Cite Count Icon 12
  • 10.48550/arxiv.2302.07459
The Capacity for Moral Self-Correction in Large Language Models
  • Feb 14, 2023
  • Deep Ganguli + 47 more

  • Open Access Icon
  • PDF Download Icon
  • Cite Count Icon 183
  • 10.18653/v1/p17-1015
Program Induction by Rationale Generation: Learning to Solve and Explain Algebraic Word Problems
  • Jan 1, 2017
  • Wang Ling + 3 more

  • Open Access Icon
  • PDF Download Icon
  • Cite Count Icon 59
  • 10.18653/v1/2023.findings-acl.824
Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them
  • Jan 1, 2023
  • Mirac Suzgun + 10 more

  • Open Access Icon
  • PDF Download Icon
  • Cite Count Icon 383
  • 10.18653/v1/d18-1260
Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering
  • Jan 1, 2018
  • Todor Mihaylov + 3 more

  • Open Access Icon
  • PDF Download Icon
  • Cite Count Icon 20
  • 10.18653/v1/2023.eacl-main.277
GrIPS: Gradient-free, Edit-based Instruction Search for Prompting Large Language Models
  • Jan 1, 2023
  • Archiki Prasad + 3 more

  • Open Access Icon
  • PDF Download Icon
  • Cite Count Icon 208
  • 10.18653/v1/2022.bigscience-1.9
GPT-NeoX-20B: An Open-Source Autoregressive Language Model
  • Jan 1, 2022
  • Sidney Black + 16 more

  • Cite Count Icon 8
  • 10.1145/3643991.3645071
Quality Assessment of ChatGPT Generated Code and their Use by Developers
  • Apr 15, 2024
  • Mohammed Latif Siddiq + 3 more

  • Open Access Icon
  • PDF Download Icon
  • Cite Count Icon 60
  • 10.18653/v1/2021.nlp4prog-1.5
CoTexT: Multi-task Learning with Code-Text Transformer
  • Jan 1, 2021
  • Long Phan + 6 more

Similar Papers
  • Research Article
  • 10.54254/2755-2721/2025.ast27451
An Empirical Study of Security Risks for the Web Code Generation by ChatGPT
  • Oct 2, 2025
  • Applied and Computational Engineering
  • Mingrui Hu

Large language models (LLMs) have demonstrated remarkable capabilities in code generation and semantic understanding, enabling ordinary users to generate their own software systems using natural language instructions. This study takes website systems as a case to investigate a user-centered paradigm for code generation and its evaluation. First, users submit their requirements to the LLM via a web interface, prompting the model to automatically generate website project code. Then, through a set of prompt engineering methods and quantitative evaluation techniques developed for this study, we conduct a multi-dimensional assessment of the quality and security of the generated website systems using different types of LLMs and varying system function weights. A hybrid evaluation strategy is proposed to integrate and optimize assessment results across different LLMs. Evaluation dimensions include the degree to which user requirements are satisfied, completeness of website functionality, potential security risks, and code reliability. This research introduces evaluation criteria such as automated review models, functional coverage, and static vulnerability analysis to explore the feasibility, advantages, and limitations of using LLMs as both code generators and reviewers. The findings contribute to our understanding of the practical value of multi-agent LLM collaboration in software development and reveal major current challenges such as functional hallucination, incomplete implementation, and overly optimistic evaluation mechanisms.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 1
  • 10.3390/electronics13112002
Exploring the Potential of Large Language Models in Radiological Imaging Systems: Improving User Interface Design and Functional Capabilities
  • May 21, 2024
  • Electronics
  • Luyao Zhang + 6 more

Large language models (LLMs) have demonstrated remarkable capabilities in natural language processing tasks, including conversation, in-context learning, reasoning, and code generation. This paper explores the potential application of LLMs in radiological information systems (RIS) and assesses the impact of integrating LLMs on RIS development and human–computer interaction. We present ChatUI-RIS, a prototype chat-based user interface that leverages LLM capabilities to enhance RIS functionality and user experience. Through an exploratory study involving 26 medical students, we investigate the efficacy of natural language dialogue for learning and operating RIS. Our findings suggest that LLM integration via a chat interface can significantly improve operational efficiency, reduce learning time, and facilitate rapid expansion of RIS capabilities. By interacting with ChatUI-RIS using natural language instructions, medical students can access and retrieve radiology information in a conversational manner. The LLM-powered chat interface not only streamlines user interactions, but also enables more intuitive and efficient navigation of complex RIS functionalities. Furthermore, the natural language processing capabilities of LLMs can be harnessed to automatically generate code snippets and database queries, accelerating RIS development and customization. Preliminary observations indicate that integrating LLMs in RIS has the potential to revolutionize user interface design, enhance system capabilities, and ultimately improve the overall user experience for radiologists and medical professionals.

  • Research Article
  • 10.36948/ijfmr.2024.v06i02.17132
Benchmarking Large Language Models for Code Generation
  • Apr 13, 2024
  • International Journal For Multidisciplinary Research
  • Sumedh Arun Patil - + 3 more

As the landscape of software development continues to evolve, the need for efficient and innovative coding practices becomes increasingly apparent. This research endeavors to explore the effectiveness of Large Language Models (LLMs) in code generation, focusing on benchmarking their performance across various coding tasks. Leveraging advanced Natural Language Processing (NLP) techniques and deep learning architectures, our study investigates how LLMs, such as the codellama-13b-instruct.Q5_K_S.gguf engine, interpret and generate code from natural language instructions. With an emphasis on accuracy, efficiency, and user accessibility, our research seeks to shed light on the capabilities of LLMs in bridging the gap between human language and executable code. By evaluating factors such as model architecture, training data quality, and task complexity, we aim to provide insights into the potential of LLMs for revolutionizing the coding experience. Through meticulous benchmarking and analysis, this study aims to contribute to the advancement of LLM development and its applications in code generation, paving the way for more efficient and inclusive coding practices in the future.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 1
  • 10.15388/lmitt.2024.20
Unit Test Generation Using Large Language Models: A Systematic Literature Review
  • May 13, 2024
  • Vilnius University Open Series
  • Dovydas Marius Zapkus + 1 more

Unit testing is a fundamental aspect of software development, ensuring the correctness and robustness of code implementations. Traditionally, unit tests are manually crafted by developers based on their understanding of the code and its requirements. However, this process can be time-consuming, errorprone, and may overlook certain edge cases. In recent years, there has been growing interest in leveraging large language models (LLMs) for automating the generation of unit tests. LLMs, such as GPT (Generative Pre-trained Transformer), CodeT5, StarCoder, LLaMA, have demonstrated remarkable capabilities in natural language understanding and code generation tasks. By using LLMs, researchers aim to develop techniques that automatically generate unit tests from code snippets or specifications, thus optimizing the software testing process. This paper presents a literature review of articles that use LLMs for unit test generation tasks. It also discusses the history of the most commonly used large language models and their parameters, including the first time they have been used for code generation tasks. The result of this study presents the large language models for code and unit test generation tasks and their increasing popularity in code generation domain, indicating a great promise for the future of unit test generation using LLMs.

  • Research Article
  • 10.55041/ijsrem36242
ProgAI: Enhancing Code Generation with LLMs For Real World Challenges
  • Jul 4, 2024
  • INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT
  • Afsal Ahamad A + 2 more

Large Language Models (LLMs) have shown promise in automated code generation but generate code units with errors because of reasons like hallucinations. Real-world soft- ware development, however, often involves complex requirements with complex dependencies and extensive documentation. To fill this gap, our research pivots towards evaluating LLMs in a more realistic setting real- world repo-level code generation. We introduce ProgAI, a manually curated LLM for proficient code generation. This LLM supports Code generation 4 coding languages – namely C++, Java, Python and C. We assess nine leading LLMs on code generation tasks and observe a decline in their performance. To tackle this, we present ProgAI, a novel LLM-based agent framework that employs external tools for effective code generation. ProgAI integrates four programming tools, enabling interaction with software artifacts for information retrieval, code symbol navigation, and code testing. We implement four agent strategies to optimize these tools’ usage. Our experiments on ProgAI show that ProgAI enhances LLM performance significantly, with improvements ranging from 18.1% to 25%. Further tests on the HumanEval benchmark confirm ProgAI’s adaptability and efficacy across various code generation tasks. Notably, ProgAI outperforms commercial products like Github Copilot, showcasing superior accuracy and efficiency. These results demonstrate ProgAI’s robust capabilities in code generation, highlighting its potential for real-world repo-level coding challenges.

  • Research Article
  • Cite Count Icon 2
  • 10.1145/3695868
Building a Coding Assistant via the Retrieval-Augmented Language Model
  • Jan 17, 2025
  • ACM Transactions on Information Systems
  • Xinze Li + 8 more

Pretrained language models have shown strong effectiveness in code-related tasks, such as code retrieval, code generation, code summarization, and code completion tasks. In this article, we propose COde assistaNt viA retrieval-augmeNted language model (CONAN), which aims to build a code assistant by mimicking the knowledge-seeking behaviors of humans during coding. Specifically, it consists of a code structure-aware retriever (CONAN-R) and a dual-view code representation-based retrieval-augmented generation model (CONAN-G). CONAN-R pretrains CodeT5 using Code-Documentation Alignment and Masked Entity Prediction tasks to make language models code structure-aware and learn effective representations for code snippets and documentation. Then CONAN-G designs a dual-view code representation mechanism for implementing a retrieval-augmented code generation model. CONAN-G regards the code documentation descriptions as prompts, which help language models better understand the code semantics. Our experiments show that CONAN achieves convincing performance on different code generation tasks and significantly outperforms previous retrieval augmented code generation models. Our further analyses show that CONAN learns tailored representations for both code snippets and documentation by aligning code-documentation data pairs and capturing structural semantics by masking and predicting entities in the code data. Additionally, the retrieved code snippets and documentation provide necessary information from both program language and natural language to assist the code generation process. CONAN can also be used as an assistant for Large Language Models (LLMs), providing LLMs with external knowledge in shorter code document lengths to improve their effectiveness on various code tasks. It shows the ability of CONAN to extract necessary information and help filter out the noise from retrieved code documents.

  • Research Article
  • Cite Count Icon 11
  • 10.1016/j.knosys.2017.10.023
Generating machine-executable plans from end-user's natural-language instructions
  • Nov 1, 2017
  • Knowledge-Based Systems
  • Rui Liu + 1 more

Generating machine-executable plans from end-user's natural-language instructions

  • Research Article
  • 10.54097/scrwpt34
Research on Code Generation Technology based on LLM Pre-training
  • Oct 28, 2024
  • Frontiers in Computing and Intelligent Systems
  • Ling Chen

In recent years, the continuous improvement and rapid development of large language model (LLM) technology, and the pre-trained code generation technology involved in it has attracted extensive attention in the industry. Through LLM, the conversion from the well-known natural language (NL) to the programming language (PL) written by professional code practitioners can be realized, which greatly reduces the threshold of programming language, and has demonstrated significant performance and advantages in code generation tasks through pre-training. This paper systematically sorts out, researches and summarizes the pre-trained code generation techniques in recent years. Firstly, the development time roadmap of the pre-trained model related to code generation is extracted from the relevant research results. Secondly, the characteristics of different code generation pre-trained models are sorted out and summarized. At the same time, the evaluation mechanism and dataset for different pre-trained code generation models are given, and the research data are compared and analyzed. Finally, combined with the current development situation, the future development direction of code generation technology is prospected.

  • Research Article
  • Cite Count Icon 23
  • 10.1609/aaai.v38i20.30205
MedAlign: A Clinician-Generated Dataset for Instruction Following with Electronic Medical Records
  • Mar 24, 2024
  • Proceedings of the AAAI Conference on Artificial Intelligence
  • Scott L Fleming + 29 more

The ability of large language models (LLMs) to follow natural language instructions with human-level fluency suggests many opportunities in healthcare to reduce administrative burden and improve quality of care. However, evaluating LLMs on realistic text generation tasks for healthcare remains challenging. Existing question answering datasets for electronic health record (EHR) data fail to capture the complexity of information needs and documentation burdens experienced by clinicians. To address these challenges, we introduce MedAlign, a benchmark dataset of 983 natural language instructions for EHR data. MedAlign is curated by 15 clinicians (7 specialities), includes clinician-written reference responses for 303 instructions, and provides 276 longitudinal EHRs for grounding instruction-response pairs. We used MedAlign to evaluate 6 general domain LLMs, having clinicians rank the accuracy and quality of each LLM response. We found high error rates, ranging from 35% (GPT-4) to 68% (MPT-7B-Instruct), and 8.3% drop in accuracy moving from 32k to 2k context lengths for GPT-4. Finally, we report correlations between clinician rankings and automated natural language generation metrics as a way to rank LLMs without human review. We make MedAlign available under a research data use agreement to enable LLM evaluations on tasks aligned with clinician needs and preferences.

  • Research Article
  • 10.1002/smr.70034
Evaluating the Test Adequacy of Benchmarks for LLMs on Code Generation
  • Jun 25, 2025
  • Journal of Software: Evolution and Process
  • Xiangyue Liu + 5 more

ABSTRACTCode generation for users' intent has become increasingly prevalent with the large language models (LLMs). To automatically evaluate the effectiveness of these models, multiple execution‐based benchmarks are proposed, including specially crafted tasks, accompanied by some test cases and a ground truth solution. LLMs are regarded as well‐performed in code generation tasks if they can pass the test cases corresponding to most tasks in these benchmarks. However, it is unknown whether the test cases have sufficient test adequacy and whether the test adequacy can affect the evaluation. In this paper, we conducted an empirical study to evaluate the test adequacy of the execution‐based benchmarks and to explore their effects during evaluation for LLMs. Based on the evaluation of the widely used benchmarks, HumanEval, MBPP, and two enhanced benchmarks HumanEval+ and MBPP+, we obtained the following results: (1) All the evaluated benchmarks have high statement coverage (above 99.16%), low branch coverage (74.39%) and low mutation score (87.69%). Especially for the tasks with higher cyclomatic complexities in the HumanEval and MBPP, the mutation score of test cases is lower. (2) No significant correlation exists between test adequacy (statement coverage, branch coverage and mutation score) of benchmarks and evaluating results on LLMs at the individual task level. (3) There is a significant positive correlation between mutation score‐based evaluation and another execution‐based evaluation metric () on LLMs at the individual task level. (4) The existing test case augmentation techniques have limited improvement in the coverage of test cases in the benchmark, while significantly improving the mutation score by approximately 34.60% and also can bring a more rigorous evaluation to LLMs on code generation. (5) The LLM‐based test case generation technique (EvalPlus) performs better than the traditional search‐based technique (Pynguin) in improving the benchmarks' test quality and evaluation ability of code generation.

  • Research Article
  • 10.1145/3728947
The First Prompt Counts the Most! An Evaluation of Large Language Models on Iterative Example-Based Code Generation
  • Jun 22, 2025
  • Proceedings of the ACM on Software Engineering
  • Yingjie Fu + 4 more

The capabilities of Large Language Models (LLMs) in code generation have been extensively studied, particularly for implementing target functionalities from natural-language descriptions. As an alternative to natural language, input-output (I/O) examples provide an accessible, unambiguous, and flexible way to describe functionalities. However, their inherent diversity, opaqueness, and incompleteness impose greater challenges for understanding and implementing the target requirements. Therefore, generating code from I/O examples (i.e., example-based code generation) provides a new perspective, allowing us to additionally evaluate LLMs’ capability to infer target functionalities from limited information and to process new-form requirements. However, related research about LLMs in example-based code generation remains largely unexplored. To fill this gap, this paper presents the first comprehensive study on example-based code generation using LLMs. To address the incorrectness caused by the incompleteness of I/O examples, we adopt an iterative evaluation framework and formalize the objective of example-based code generation as two sequential sub-objectives: generating code conforming to the given examples and generating code that successfully implements the target functionalities from (iteratively) given examples. We assess six state-of-the-art LLMs using a new benchmark of 172 diverse target functionalities (derived from HumanEval and CodeHunt). The results demonstrate that when requirements are described using iterative I/O examples rather than natural language, the LLMs’ score decreases by over 60%, indicating that example-based code generation remains challenging for the evaluated LLMs. Notably, the vast majority (even over 95%) of successfully implemented functionalities are achieved in the first round of the iterations, suggesting that the LLMs struggle to effectively utilize the iteratively supplemented requirements. Furthermore, we find that combining I/O examples with even imprecise and fragmental natural language descriptions greatly improves LLM performance, and the selection of initial I/O examples can also influence the score, suggesting opportunities for prompt optimization. These findings highlight the importance of early prompts during interactions and offer critical insights and implications for enhancing LLM-based code generation.

  • Research Article
  • 10.1145/3772721
Exploring Data-Efficient Adaptation of Large Language Models for Code Generation
  • Oct 27, 2025
  • ACM Transactions on Software Engineering and Methodology
  • Xue Jiang + 5 more

Although Large Language Models (LLMs) have made significant progress in code generation, they still struggle with code generation tasks in specific scenarios. These scenarios usually necessitate the adaptation of LLMs to fulfill specific needs, but the limited training data available in practice leads to poor code generation performance. Therefore, how to effectively adapt LLMs to new scenarios with few training data is a major challenge for current code generation. In this paper, we propose a novel adaptation approach named DEED, which stands for D ata- E fficient adaptation with E rror- D riven learning for code generation. DEED leverages the errors made by LLMs as learning opportunities, using error revision to overcome their own shortcomings, thus achieving efficient learning. Specifically, DEED involves identifying error code generated by LLMs, employing Self-Revise for code revision, optimizing the model with revised code, and iteratively adapting the process for continuous improvement. Experimental results show that, compared to other mainstream fine-tuning approaches, DEED achieves superior performance with few training data, showing an average relative improvement of 46.2% in Pass@1 on multiple code generation benchmarks. We also validate the effectiveness of Self-Revise, which generates revised code that optimizes the model more efficiently compared to the code samples from datasets. Moreover, DEED consistently demonstrates strong performance across various LLMs, underscoring its applicability.

  • Research Article
  • 10.3390/aerospace12060498
Using Large Language Models for Aerospace Code Generation: Methods, Benchmarks, and Potential Values
  • May 30, 2025
  • Aerospace
  • Rui He + 4 more

In recent years, Large Language Models (LLMs) have witnessed rapid advancements, revolutionizing various domains. Within the realm of software development, code generation technology powered by LLMs has emerged as a prominent research focus. Despite its potential, the application of this technology in the aerospace sector remains in its nascent, exploratory phase. This paper delves into the intricacies of LLM-based code generation methods and explores their potential applications in aerospace contexts. It introduces RepoSpace, the pioneering warehouse-level benchmark test for code generation of spaceborne equipment. Comprising 825 samples from five actual projects, this benchmark offers a more precise evaluation of LLMs’ capabilities in aerospace scenarios. Through extensive evaluations of seven state-of-the-art LLMs on RepoSpace, the study reveals that domain-specific differences significantly impact the code generation performance of LLMs. Existing LLMs exhibit subpar performance in specialized warehouse-level code generation tasks for aerospace, with their performance markedly lower than that of domain tasks. The research further demonstrates that Retrieval Augmented Generation (RAG) technology can effectively enhance LLMs’ code generation capabilities. Additionally, the use of appropriate prompt templates can guide the models to achieve superior results. Moreover, high-quality documentation strings are found to be crucial in improving LLMs’ performance in warehouse-level code generation tasks. This study provides a vital reference for leveraging LLMs for code generation in the aerospace field, thereby fostering technological innovation and progress in this critical domain.

  • Research Article
  • Cite Count Icon 35
  • 10.1145/3708882
Recommendation as Instruction Following: A Large Language Model Empowered Recommendation Approach
  • Jul 10, 2025
  • ACM Transactions on Information Systems
  • Junjie Zhang + 5 more

In the past few decades, recommender systems have attracted much attention in both research and industry communities. Existing recommendation models mainly learn the underlying user preference from historical behavior data (typically in the forms of item IDs), and then estimate the user–item matching relationships for recommendations. Inspired by the recent progress on large language models (LLMs), we develop a different recommendation paradigm, considering recommendation as instruction following by LLMs. The key idea is that the needs of a user can be expressed in natural language descriptions (called instructions ), so that LLMs can understand and further execute the instruction for fulfilling the recommendation. For this purpose, we instruction tune the 3B Flan-T5-XL, to better adapt LLMs to recommender systems. We first design a general instruction format for describing the preference, intention, and task form of a user in natural language. Then we manually design 39 instruction templates and automatically generate large amounts of user-personalized instruction data with varying types of preferences and intentions. To demonstrate the effectiveness of our approach, we instantiate the instructions into several widely studied recommendation (or search) tasks, and conduct extensive experiments with real-world datasets. Experiment results show that our approach can outperform several competitive baselines, including the powerful GPT-3.5, on these evaluation tasks. Our approach sheds light on developing user-friendly recommender systems, in which users can freely communicate with the system and obtain accurate recommendations via natural language instructions.

  • Research Article
  • Cite Count Icon 8
  • 10.1287/ijds.2023.0007
How Can IJDS Authors, Reviewers, and Editors Use (and Misuse) Generative AI?
  • Apr 1, 2023
  • INFORMS Journal on Data Science
  • Galit Shmueli + 7 more

How Can <i>IJDS</i> Authors, Reviewers, and Editors Use (and Misuse) Generative AI?

More from: ACM Transactions on Software Engineering and Methodology
  • New
  • Research Article
  • 10.1145/3774889
False-Positive Bug Reports in Deep Learning Compilers: Stages, Root Causes, and Mitigation
  • Nov 6, 2025
  • ACM Transactions on Software Engineering and Methodology
  • Lili Huang + 5 more

  • Research Article
  • 10.1145/3773287
Larger Is Not Always Better: Exploring Small Open-source Language Models in Logging Statement Generation
  • Oct 28, 2025
  • ACM Transactions on Software Engineering and Methodology
  • Renyi Zhong + 6 more

  • Research Article
  • 10.1145/3773285
TaskEval: Assessing Difficulty of Code Generation Tasks for Large Language Models
  • Oct 28, 2025
  • ACM Transactions on Software Engineering and Methodology
  • Florian Tambon + 4 more

  • Research Article
  • 10.1145/3773034
Synthesizing Efficient and Permissive Programmatic Runtime Shields for Neural Policies
  • Oct 27, 2025
  • ACM Transactions on Software Engineering and Methodology
  • Jieke Shi + 4 more

  • Research Article
  • 10.1145/3772721
Exploring Data-Efficient Adaptation of Large Language Models for Code Generation
  • Oct 27, 2025
  • ACM Transactions on Software Engineering and Methodology
  • Xue Jiang + 5 more

  • Research Article
  • 10.1145/3773088
Causally Perturbed Fairness Testing
  • Oct 27, 2025
  • ACM Transactions on Software Engineering and Methodology
  • Chengwen Du + 1 more

  • Research Article
  • 10.1145/3773084
Large Language Models for Constructing and Optimizing Machine Learning Workflows: A Survey
  • Oct 27, 2025
  • ACM Transactions on Software Engineering and Methodology
  • Yang Gu + 5 more

  • Research Article
  • 10.1145/3772084
SETS: A Simple yet Effective DNN Test Selection Approach
  • Oct 18, 2025
  • ACM Transactions on Software Engineering and Methodology
  • Jingling Wang + 4 more

  • Research Article
  • 10.1145/3771929
Continuously Learning Bug Locations
  • Oct 18, 2025
  • ACM Transactions on Software Engineering and Methodology
  • Paulina Stevia Nouwou Mindom + 3 more

  • Research Article
  • 10.1145/3733715
Stress Testing Control Loops in Cyber-Physical Systems—RCR Report
  • Oct 17, 2025
  • ACM Transactions on Software Engineering and Methodology
  • Claudio Mandrioli + 4 more

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.

Search IconWhat is the difference between bacteria and viruses?
Open In New Tab Icon
Search IconWhat is the function of the immune system?
Open In New Tab Icon
Search IconCan diabetes be passed down from one generation to the next?
Open In New Tab Icon