GenUTest: a unit test and mock aspect generation tool

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

Unit testing plays a major role in the software development process. What started as an ad hoc approach is becoming a common practice among developers. It enables the immediate detection of bugs introduced into a unit whenever code changes occur. Hence, unit tests provide a safety net of regression tests and validation tests which encourage developers to refactor existing code with greater confidence. One of the major corner stones of the agile development approach is unit testing. Agile methods require all software classes to have unit tests that can be executed by an automated unit-testing framework. However, not all software systems have unit tests. When changes to such software are needed, writing unit tests from scratch, which is hard and tedious, might not be cost effective. In this paper we propose a technique which automatically generates unit tests for software that does not have such tests. We have implemented GenUTest, a prototype tool which captures and logs interobject interactions occurring during the execution of Java programs, using the aspect-oriented language AspectJ. These interactions are used to generate JUnit tests. They also serve in generating mock aspects—mock object-like entities, which enable testing units in isolation. The generated JUnit tests and mock aspects are independent of the tool, and can be used by developers to perform unit tests on the software. Comprehensiveness of the unit tests depends on the software execution. We applied GenUTest to several open source projects such as NanoXML and JODE. We present the results, explain the limitations of the tool, and point out direction to future work to improve the code coverage provided by GenUTest and its scalability.

Similar Papers
  • Book Chapter
  • Cite Count Icon 7
  • 10.1007/978-3-540-77966-7_20
GenUTest: A Unit Test and Mock Aspect Generation Tool
  • Oct 23, 2007
  • Benny Pasternak + 2 more

Unit testing plays a major role in the software development process. It enables the immediate detection of bugs introduced into a unit whenever code changes occur. Hence, unit tests provide a safety net of regression tests and validation tests which encourage developers to refactor existing code. Nevertheless, not all software systems contain unit tests. When changes to such software are needed, writing unit tests from scratch might not be cost effective. In this paper we propose a technique which automatically generates unit tests for software that does not have such tests.We have implemented GenUTest, a tool which captures and logs inter-object interactions occurring during the execution of Java programs. These interactions are used to generate JUnit tests. They also serve in generating mock aspects - mock object like entities, which assist the testing process. The interactions are captured using the aspect oriented language AspectJ.

  • Conference Article
  • Cite Count Icon 36
  • 10.1109/icse43902.2021.00138
Automatic Unit Test Generation for Machine Learning Libraries: How Far Are We?
  • May 1, 2021
  • Song Wang + 5 more

Automatic unit test generation that explores the input space and produces effective test cases for given programs have been studied for decades. Many unit test generation tools that can help generate unit test cases with high structural coverage over a program have been examined. However, the fact that existing test generation tools are mainly evaluated on general software programs calls into question about its practical effectiveness and usefulness for machine learning libraries, which are statistically orientated and have fundamentally different nature and construction from general software projects. In this paper, we set out to investigate the effectiveness of existing unit test generation techniques on machine learning libraries. To investigate this issue, we conducted an empirical study on five widely used machine learning libraries with two popular unit testcase generation tools, i.e., EVOSUITE and Randoop. We find that (1) most of the machine learning libraries do not maintain a high-quality unit test suite regarding commonly applied quality metrics such as code coverage (on average is 34.1%) and mutation score (on average is 21.3%), (2) unit test case generation tools, i.e., EVOSUITE and Randoop, lead to clear improvements in code coverage and mutation score, however, the improvement is limited, and (3) there exist common patterns in the uncovered code across the five machine learning libraries that can be used to improve unit test case generation tasks.

  • Conference Article
  • Cite Count Icon 3
  • 10.1145/3593434.3593443
NxtUnit: Automated Unit Test Generation for Go
  • Jun 14, 2023
  • Siwei Wang + 5 more

Automated test generation has been extensively studied for dynamically compiled or typed programming languages like Java and Python. However, Go, a popular statically compiled and typed programming language for server application development, has received limited support from existing tools. To address this gap, we present NxtUnit, an automatic unit test generation tool for Go that uses random testing and is well-suited for microservice architecture. NxtUnit employs a random approach to generate unit tests quickly, making it ideal for smoke testing and providing quick quality feedback. It comes with three types of interfaces: an integrated development environment (IDE) plugin, a command-line interface (CLI), and a browser-based platform. The plugin and CLI tool allow engineers to write unit tests more efficiently, while the platform provides unit test visualization and asynchronous unit test generation. We evaluated NxtUnit by generating unit tests for 13 open-source repositories and 500 ByteDance in-house repositories, resulting in a code coverage of 20.74% for in-house repositories. We conducted a survey among Bytedance engineers and found that NxtUnit can save them 48% of the time on writing unit tests. We have made the CLI tool available at https://github.com/bytedance/nxt_unit.

  • Research Article
  • Cite Count Icon 2
  • 10.5555/1667865.1667866
GenUTest: a unit test and mock aspect generation tool
  • Oct 20, 2009
  • Pasternakbenny + 2 more

Unit testing plays a major role in the software development process. What started as an ad hoc approach is becoming a common practice among developers. It enables the immediate detection of bugs in...

  • Research Article
  • Cite Count Icon 3
  • 10.1007/s13198-011-0068-3
KeYGenU: combining verification-based and capture and replay techniques for regression unit testing
  • Jun 1, 2011
  • International Journal of System Assurance Engineering and Management
  • Bernhard Beckert + 3 more

Unit testing plays a major role in the software development process. Two essential criteria to achieve effective unit testing are: (1) testing each unit in isolation from other parts of the program and (2) achieving high code coverage. The former requires a lot of extra work such as writing drivers and stubs, whereas the latter is difficult to achieve when manually writing the tests. When changing existing code it is advocated to run the unit tests to avoid regression bugs. However, in many cases legacy software has no unit tests. Writing those tests from scratch is a hard and tedious process, which might not be cost-effective. This paper presents a tool chain approach that combines verification-based testing (VBT) and capture and replay (CaR) test generation methods. We have built a concrete tool chain, KeYGenU, which consists of two existing tools—KeY and GenUTest. The KeY system is a deductive verification and test-generation tool. GenUTest automatically generates JUnit tests for a correctly working software. This combination provides isolated unit test suites with high code-coverage. The generated tests can also be used for regression testing.

  • Conference Article
  • Cite Count Icon 3
  • 10.5753/sbes.2024.3561
Detecting Test Smells in Python Test Code Generated by LLM: An Empirical Study with GitHub Copilot
  • Sep 30, 2024
  • Victor Anthony Alves + 3 more

Writing unit tests is a time-consuming and labor-intensive development practice. Consequently, various techniques for automatically generating unit tests have been studied. Among them, the use of Large Language Models (LLMs) has recently emerged as a popular approach for automatically generating tests from natural language descriptions. Although many recent studies are dedicated to measuring the ability of LLMs to write valid unit tests, few evaluate the quality of these generated tests. In this context, this study aims to measure the quality of the test codes generated by GitHub Copilot in Python by detecting test smells in the test cases generated. To do this, we used approaches to generating unit tests by LLMs that have already been applied in the literature and collected a sample of 194 unit test cases in 30 Python test files. We then measured them using tools specialized in detecting test smells in Python. Finally, we conducted an evaluation of these test cases with software developers and software quality assurance professionals. Our results indicated that 47.4% of the tests generated by Copilot had at least one test smell detected, with a lack of documentation in the assertions being the most common quality problem. These findings indicate that although GitHub Copilot can generate valid unit tests, quality violations are still frequently found in these codes.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 4
  • 10.1145/3638245
Beyond Accuracy: An Empirical Study on Unit Testing in Open-source Deep Learning Projects
  • Apr 18, 2024
  • ACM Transactions on Software Engineering and Methodology
  • Han Wang + 4 more

Deep Learning (DL) models have rapidly advanced, focusing on achieving high performance through testing model accuracy and robustness. However, it is unclear whether DL projects, as software systems, are tested thoroughly or functionally correct when there is a need to treat and test them like other software systems. Therefore, we empirically study the unit tests in open-source DL projects, analyzing 9,129 projects from GitHub. We find that: (1) unit tested DL projects have positive correlation with the open-source project metrics and have a higher acceptance rate of pull requests; (2) 68% of the sampled DL projects are not unit tested at all; (3) the layer and utilities (utils) of DL models have the most unit tests. Based on these findings and previous research outcomes, we built a mapping taxonomy between unit tests and faults in DL projects. We discuss the implications of our findings for developers and researchers and highlight the need for unit testing in open-source DL projects to ensure their reliability and stability. The study contributes to this community by raising awareness of the importance of unit testing in DL projects and encouraging further research in this area.

  • Research Article
  • Cite Count Icon 3
  • 10.1145/3715778
Less Is More: On the Importance of Data Quality for Unit Test Generation
  • Jun 19, 2025
  • Proceedings of the ACM on Software Engineering
  • Junwei Zhang + 5 more

Unit testing is crucial for software development and maintenance. Effective unit testing ensures and improves software quality, but writing unit tests is time-consuming and labor-intensive. Recent studies have proposed deep learning (DL) techniques or large language models (LLMs) to automate unit test generation. These models are usually trained or fine-tuned on large-scale datasets. Despite growing awareness of the importance of data quality, there has been limited research on the quality of datasets used for test generation. To bridge this gap, we systematically examine the impact of noise on the performance of learning-based test generation models. We first apply the open card sorting method to analyze the most popular and largest test generation dataset, Methods2Test, to categorize eight distinct types of noise. Further, we conduct detailed interviews with 17 domain experts to validate and assess the importance, reasonableness, and correctness of the noise taxonomy. Then, we propose CleanTest, an automated noise-cleaning framework designed to improve the quality of test generation datasets. CleanTest comprises three filters: a rule-based syntax filter, a rule-based relevance filter, and a model-based coverage filter. To evaluate its effectiveness, we apply CleanTest on two widely-used test generation datasets, i.e., Methods2Test and Atlas. Our findings indicate that 43.52% and 29.65% of datasets contain noise, highlighting its prevalence. Finally, we conduct comparative experiments using four LLMs (i.e., CodeBERT, AthenaTest, StarCoder, and CodeLlama7B) to assess the impact of noise on test generation performance. The results show that filtering noise positively influences the test generation ability of the models. Fine-tuning the four LLMs with the filtered Methods2Test dataset, on average, improves its performance by 67% in branch coverage, using the Defects4J benchmark. For the Atlas dataset, the four LLMs improve branch coverage by 39%. Additionally, filtering noise improves bug detection performance, resulting in a 21.42% increase in bugs detected by the generated tests.

  • Research Article
  • Cite Count Icon 1
  • 10.1145/3765758
Reference-Based Retrieval-Augmented Unit Test Generation
  • Dec 3, 2025
  • ACM Transactions on Software Engineering and Methodology
  • Zhe Zhang + 5 more

Automated unit test generation has been widely studied, with Large Language Models (LLMs) recently showing significant potential. LLMs like GPT-4, trained in vast text and code data, excel in various code-related tasks, including unit test generation. However, existing LLM-based approaches often focus solely on the context within the code itself, such as referenced variables, while neglecting broader task-specific contexts, such as the utility of referring to existing tests of relevant methods in unit test generation. Moreover, in the context of unit test generation, these tools prioritize high code coverage, often at the expense of practical usability, correctness, and maintainability. In response, we propose Reference-Based Retrieval Augmentation , a novel mechanism that extends LLM-based Retrieval-Augmented Generation (RAG) to retrieve relevant information by considering task-specific context. In the unit test generation task, for a given focal method, the reference relationships is defined as the reusability or referentiality of tests between the focal method and other methods. To generate high-quality unit tests for the focal method, the test reference relationships are then used to retrieve relevant methods and their existing unit tests. Specifically, we account for the unique structure of unit tests by dividing the test generation process into Given , When , and Then phases. When generating unit tests for a focal method, we retrieve pre-existing tests of other relevant methods, which can provide valuable insights for any of the Given , When , and Then phases. We implement this approach in a tool called RefTest , which sequentially performs preprocessing, test reference retrieval, and unit test generation, using an incremental strategy in which newly generated tests guide the creation of subsequent ones. We evaluated RefTest on 12 open-source projects with 1515 methods, and the results demonstrate that RefTest consistently outperforms existing tools in terms of correctness, completeness, and maintainability of the generated tests.

  • Research Article
  • Cite Count Icon 75
  • 10.1007/s10851-006-8530-6
Tool-assisted unit-test generation and selection based on operational abstractions
  • Jul 1, 2006
  • Automated Software Engineering
  • Tao Xie + 1 more

Unit testing, a common step in software development, presents a challenge. When produced manually, unit test suites are often insufficient to identify defects. The main alternative is to use one of a variety of automatic unit-test generation tools: these are able to produce and execute a large number of test inputs that extensively exercise the unit under test. However, without a priori specifications, programmers need to manually verify the outputs of these test executions, which is generally impractical. To reduce this cost, unit-test selection techniques may be used to help select a subset of automatically generated test inputs. Then programmers can verify their outputs, equip them with test oracles, and put them into the existing test suite. In this paper, we present the operational violation approach for unit-test generation and selection, a black-box approach without requiring a priori specifications. The approach dynamically generates operational abstractions from executions of the existing unit test suite. These operational abstractions guide test generation tools to generate tests to violate them. The approach selects those generated tests violating operational abstractions for inspection. These selected tests exercise some new behavior that has not been exercised by the existing tests. We implemented this approach by integrating the use of Daikon (a dynamic invariant detection tool) and Parasoft Jtest (a commercial Java unit testing tool), and conducted several experiments to assess the approach.

  • Research Article
  • 10.1145/3729362
UnitCon: Synthesizing Targeted Unit Tests for Java Runtime Exceptions
  • Jun 19, 2025
  • Proceedings of the ACM on Software Engineering
  • Sujin Jang + 3 more

We present UnitCon, a system for synthesizing targeted unit testsfor runtime exceptions in Java programs. Targeted unit tests aim to reveal a bug at a specific location in the program under test. This capability benefits various tasks in software development, such as patch testing, crash reproduction, or static analysis alarm inspection. However, conventional unit test generation tools are mainly designed for regression tests by maximizing code coverage; hence they are not effective at such target-specific tasks. In this paper, we propose a novel synthesis technique that effectively guides the search for targeted unit tests. The key idea is to use static analysis to prune and prioritize the search space by estimating the semantics of candidate test cases. This allows us to efficiently focus on promising unit tests that are likely to trigger runtime exceptions at the target location. According to our experiments on a suite of Java programs, our approach outperforms the state-of-the-art unit test generation tools. We also applied UnitCon for inspecting static analysis alarms for null pointer exceptions (NPEs) in 51 open-source projects and discovered 21 previously unknown NPE bugs.

  • Conference Article
  • Cite Count Icon 8
  • 10.1109/icodse.2015.7437005
Unit test code generator for lua programming language
  • Nov 1, 2015
  • Junno Tantra Pratama Wibowo + 2 more

Software testing is an important step in the software development lifecycle. One of the main process that take lots of time is developing the test code. We propose an automatic unit test code generation to speed up the process and helps avoiding repetition. We develop the unit test code generator using Lua programming language. Lua is a fast, lightweight, embeddable scripting language. It has been used in many industrial applications with focuses on embedded systems and games. Unlike other popular scripting language like JavaScript, Python, and Ruby, Lua does not have any unit test generator developed to help its software testing process. The final product, Lua unit test generator (LUTG), integrated to one of the most popular Lua IDE, ZeroBrane Studio, as a plugin to seamlessly connect the coding and testing process. The code generator can generate unit test code, save test cases data on Lua and XML file format, and generate the test data automatically using search-based technique, genetic algorithm, to achieve full branch coverage test criteria. Using this generator to test several Lua source code files shows that the developed unit test generator can help the unit testing process. It was expected that the unit test generator can improve productivity, quality, consistency, and abstraction of unit testing process.

  • Conference Article
  • Cite Count Icon 14
  • 10.1109/apsec.2004.63
JAOUT: Automated Generation of Aspect-Oriented Unit Test
  • Nov 30, 2004
  • Guoqing Xu + 5 more

Unit testing is a methodology for testing small parts of an application independently of whatever application uses them. It is time consuming and tedious to write unit tests, and it is especially difficult to write unit tests that model the pattern of usage of the application. Aspect-oriented programming (AOP) addresses the problem of separation of concerns in programs which is well suited to unit test problems. What's more, unit tests should be made from different concerns in the application instead of just from functional assertions of correctness or error. In this paper, we firstly present a new concept, application-specific Aspects, which mean top-level aspects picked up from generic low-level aspects in AOP for specific use. It can be viewed as the separation of concerns on applications of generic low-level aspects. Second, this paper describes an aspect-oriented test description language (AOTDL) and techniques to build top-level aspects for testing on generic aspects. Third, we generate JUnit unit testing framework and test oracles from AspectJ programs by integrating our tool with AspectJ and JUnit. We use runtime exceptions thrown by testing aspects to decide whether methods work well. Finally, we present a double-phase testing way to filter out meaningless test cases in our framework.

  • Conference Article
  • 10.5753/sast.2025.14036
On the Energy Footprint of Using a Small Language Model for Unit Test Generation
  • Sep 22, 2025
  • Rafael S Durelli + 2 more

Context. Manual unit test creation is a cognitively intensive and time-consuming activity, prompting researchers and practitioners to increasingly adopt automated testing tools. Recent advancements in language models have expanded automation possibilities, including unit test generation, yet these models raise substantial sustainability concerns due to their energy consumption compared to conventional, specialized tools. Goal. Our research investigates whether the energy overhead associated with employing a small language model (SLM) for unit test generation is justified compared to a conventional, lightweight testing tool. We compare and analyze the energy consumption incurred during test suite generation, as well as the fault-finding effectiveness of the resulting test suites, for an SLM (Phi-3.1 Mini 128k) and Pynguin, a purpose-built tool for unit test generation. Method.We posed two research questions: (i) What is the difference in energy usage between Phi and Pynguin during the generation of unit test suites for Python programs?; and (ii) To what extent do unit test suites generated by Phi and Pynguin differ in their fault-finding effectiveness? To rigorously address the first research question, we employed Bayesian Data Analysis (BDA). For the second research question, we conducted a complementary empirical analysis using descriptive statistics. Results. Our Bayesian analysis provides robust evidence indicating that Phi consistently consumes significantly more energy than Pynguin during test suite generation. Conclusions. These findings underscore significant sustainability concerns associated with employing even SLMs for routine Software Engineering tasks such as unit test generation. The results challenge the assumption of universal energy efficiency benefits from smaller-scale models and emphasize the necessity for careful energy consumption evaluations in the adoption of automated software testing approaches.

  • Conference Article
  • Cite Count Icon 41
  • 10.1109/icst.2016.44
Unit Test Generation During Software Development: EvoSuite Plugins for Maven, IntelliJ and Jenkins
  • Apr 1, 2016
  • Andrea Arcuri + 2 more

Different techniques to automatically generate unit tests for object oriented classes have been proposed, but how to integrate these tools into the daily activities of software development is a little investigated question. In this paper, we report on our experience in supporting industrial partners in introducing the EvoSuite automated JUnit test generation tool in their software development processes. The first step consisted of providing a plugin to the Apache Maven build infrastructure. The move from a research-oriented point-and-click tool to an automated step of the build process has implications on how developers interact with the tool and generated tests, and therefore, we produced a plugin for the popular IntelliJ Integrated Development Environment (IDE). As build automation is a core component of Continuous Integration (CI), we provide a further plugin to the Jenkins CI system, which allows developers to monitor the results of EvoSuite and integrate generated tests in their source tree. In this paper, we discuss the resulting architecture of the plugins, and the challenges arising when building such plugins. Although the plugins described are targeted for the EvoSuite tool, they can be adapted and their architecture can be reused for other test generation tools as well.

Save Icon
Up Arrow
Open/Close
Notes

Save Important notes in documents

Highlight text to save as a note, or write notes directly

You can also access these Documents in Paperpal, our AI writing tool

Powered by our AI Writing Assistant