Tool-assisted unit-test generation and selection based on operational abstractions

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

Unit testing, a common step in software development, presents a challenge. When produced manually, unit test suites are often insufficient to identify defects. The main alternative is to use one of a variety of automatic unit-test generation tools: these are able to produce and execute a large number of test inputs that extensively exercise the unit under test. However, without a priori specifications, programmers need to manually verify the outputs of these test executions, which is generally impractical. To reduce this cost, unit-test selection techniques may be used to help select a subset of automatically generated test inputs. Then programmers can verify their outputs, equip them with test oracles, and put them into the existing test suite. In this paper, we present the operational violation approach for unit-test generation and selection, a black-box approach without requiring a priori specifications. The approach dynamically generates operational abstractions from executions of the existing unit test suite. These operational abstractions guide test generation tools to generate tests to violate them. The approach selects those generated tests violating operational abstractions for inspection. These selected tests exercise some new behavior that has not been exercised by the existing tests. We implemented this approach by integrating the use of Daikon (a dynamic invariant detection tool) and Parasoft Jtest (a commercial Java unit testing tool), and conducted several experiments to assess the approach.

Similar Papers
  • Conference Article
  • Cite Count Icon 40
  • 10.1109/ase.2003.1240293
Tool-assisted unit test selection based on operational violations
  • Jan 23, 2004
  • Tao Xie + 1 more

Unit testing, a common step in software development, presents a challenge. When produced manually, unit test suites are often insufficient to identify defects. The main alternative is to use one of a variety of automatic unit test generation tools: these are able to produce and execute a large number of test inputs that extensively exercise the unit under test. However, without a priori specifications, developers need to manually verify the outputs of these test executions, which is generally impractical. To reduce this cost, unit test selection techniques may be used to help select a subset of automatically generated test inputs. Then developers can verify their outputs, equip them with test oracles, and put them into the existing test suite. In this paper, we present the operational violation approach for unit test selection, a black-box approach without requiring a priori specifications. The approach dynamically generates operational abstractions from executions of the existing unit test suite. Any automatically generated tests violating the operational abstractions are identified as candidates for selection. In addition, these operational abstractions can guide test generation tools to produce better tests. To experiment dynamic approach, we integrated the use of Daikon (a dynamic invariant detection tool) and Jtest (a commercial Java unit testing tool). An experiment is conducted to assess this approach.

  • Conference Article
  • 10.5753/sast.2025.14036
On the Energy Footprint of Using a Small Language Model for Unit Test Generation
  • Sep 22, 2025
  • Rafael S Durelli + 2 more

Context. Manual unit test creation is a cognitively intensive and time-consuming activity, prompting researchers and practitioners to increasingly adopt automated testing tools. Recent advancements in language models have expanded automation possibilities, including unit test generation, yet these models raise substantial sustainability concerns due to their energy consumption compared to conventional, specialized tools. Goal. Our research investigates whether the energy overhead associated with employing a small language model (SLM) for unit test generation is justified compared to a conventional, lightweight testing tool. We compare and analyze the energy consumption incurred during test suite generation, as well as the fault-finding effectiveness of the resulting test suites, for an SLM (Phi-3.1 Mini 128k) and Pynguin, a purpose-built tool for unit test generation. Method.We posed two research questions: (i) What is the difference in energy usage between Phi and Pynguin during the generation of unit test suites for Python programs?; and (ii) To what extent do unit test suites generated by Phi and Pynguin differ in their fault-finding effectiveness? To rigorously address the first research question, we employed Bayesian Data Analysis (BDA). For the second research question, we conducted a complementary empirical analysis using descriptive statistics. Results. Our Bayesian analysis provides robust evidence indicating that Phi consistently consumes significantly more energy than Pynguin during test suite generation. Conclusions. These findings underscore significant sustainability concerns associated with employing even SLMs for routine Software Engineering tasks such as unit test generation. The results challenge the assumption of universal energy efficiency benefits from smaller-scale models and emphasize the necessity for careful energy consumption evaluations in the adoption of automated software testing approaches.

  • Research Article
  • Cite Count Icon 13
  • 10.1098/rstb.2017.0381
Towards systematic, data-driven validation of a collaborative, multi-scale model of Caenorhabditis elegans.
  • Sep 10, 2018
  • Philosophical Transactions of the Royal Society B: Biological Sciences
  • Richard C Gerkin + 2 more

The OpenWorm Project is an international open-source collaboration to create a multi-scale model of the organism Caenorhabditis elegans At each scale, including subcellular, cellular, network and behaviour, this project employs one or more computational models that aim to recapitulate the corresponding biological system at that scale. This requires that the simulated behaviour of each model be compared with experimental data both as the model is continuously refined and as new experimental data become available. Here we report the use of SciUnit, a software framework for model validation, to attempt to achieve these goals. During project development, each model is continuously subjected to data-driven 'unit tests' that quantitatively summarize model-data agreement, identifying modelling progress and highlighting particular aspects of each model that fail to adequately reproduce known features of the biological organism and its components. This workflow is publicly visible via both GitHub and a web application and accepts community contributions to ensure that modelling goals are transparent and well-informed.This article is part of a discussion meeting issue 'Connectome to behaviour: modelling C. elegans at cellular resolution'.

  • Conference Article
  • Cite Count Icon 36
  • 10.1109/icse43902.2021.00138
Automatic Unit Test Generation for Machine Learning Libraries: How Far Are We?
  • May 1, 2021
  • Song Wang + 5 more

Automatic unit test generation that explores the input space and produces effective test cases for given programs have been studied for decades. Many unit test generation tools that can help generate unit test cases with high structural coverage over a program have been examined. However, the fact that existing test generation tools are mainly evaluated on general software programs calls into question about its practical effectiveness and usefulness for machine learning libraries, which are statistically orientated and have fundamentally different nature and construction from general software projects. In this paper, we set out to investigate the effectiveness of existing unit test generation techniques on machine learning libraries. To investigate this issue, we conducted an empirical study on five widely used machine learning libraries with two popular unit testcase generation tools, i.e., EVOSUITE and Randoop. We find that (1) most of the machine learning libraries do not maintain a high-quality unit test suite regarding commonly applied quality metrics such as code coverage (on average is 34.1%) and mutation score (on average is 21.3%), (2) unit test case generation tools, i.e., EVOSUITE and Randoop, lead to clear improvements in code coverage and mutation score, however, the improvement is limited, and (3) there exist common patterns in the uncovered code across the five machine learning libraries that can be used to improve unit test case generation tasks.

  • Conference Article
  • Cite Count Icon 8
  • 10.1109/icodse.2015.7437005
Unit test code generator for lua programming language
  • Nov 1, 2015
  • Junno Tantra Pratama Wibowo + 2 more

Software testing is an important step in the software development lifecycle. One of the main process that take lots of time is developing the test code. We propose an automatic unit test code generation to speed up the process and helps avoiding repetition. We develop the unit test code generator using Lua programming language. Lua is a fast, lightweight, embeddable scripting language. It has been used in many industrial applications with focuses on embedded systems and games. Unlike other popular scripting language like JavaScript, Python, and Ruby, Lua does not have any unit test generator developed to help its software testing process. The final product, Lua unit test generator (LUTG), integrated to one of the most popular Lua IDE, ZeroBrane Studio, as a plugin to seamlessly connect the coding and testing process. The code generator can generate unit test code, save test cases data on Lua and XML file format, and generate the test data automatically using search-based technique, genetic algorithm, to achieve full branch coverage test criteria. Using this generator to test several Lua source code files shows that the developed unit test generator can help the unit testing process. It was expected that the unit test generator can improve productivity, quality, consistency, and abstraction of unit testing process.

  • Book Chapter
  • Cite Count Icon 10
  • 10.1016/b978-0-12-396526-4.00004-7
Advances on Improving Automation in Developer Testing
  • Jan 1, 2012
  • Advances In Computers
  • Xusheng Xiao + 2 more

Advances on Improving Automation in Developer Testing

  • Research Article
  • Cite Count Icon 1
  • 10.1145/3765758
Reference-Based Retrieval-Augmented Unit Test Generation
  • Dec 3, 2025
  • ACM Transactions on Software Engineering and Methodology
  • Zhe Zhang + 5 more

Automated unit test generation has been widely studied, with Large Language Models (LLMs) recently showing significant potential. LLMs like GPT-4, trained in vast text and code data, excel in various code-related tasks, including unit test generation. However, existing LLM-based approaches often focus solely on the context within the code itself, such as referenced variables, while neglecting broader task-specific contexts, such as the utility of referring to existing tests of relevant methods in unit test generation. Moreover, in the context of unit test generation, these tools prioritize high code coverage, often at the expense of practical usability, correctness, and maintainability. In response, we propose Reference-Based Retrieval Augmentation , a novel mechanism that extends LLM-based Retrieval-Augmented Generation (RAG) to retrieve relevant information by considering task-specific context. In the unit test generation task, for a given focal method, the reference relationships is defined as the reusability or referentiality of tests between the focal method and other methods. To generate high-quality unit tests for the focal method, the test reference relationships are then used to retrieve relevant methods and their existing unit tests. Specifically, we account for the unique structure of unit tests by dividing the test generation process into Given , When , and Then phases. When generating unit tests for a focal method, we retrieve pre-existing tests of other relevant methods, which can provide valuable insights for any of the Given , When , and Then phases. We implement this approach in a tool called RefTest , which sequentially performs preprocessing, test reference retrieval, and unit test generation, using an incremental strategy in which newly generated tests guide the creation of subsequent ones. We evaluated RefTest on 12 open-source projects with 1515 methods, and the results demonstrate that RefTest consistently outperforms existing tools in terms of correctness, completeness, and maintainability of the generated tests.

  • Conference Article
  • Cite Count Icon 3
  • 10.1145/3593434.3593443
NxtUnit: Automated Unit Test Generation for Go
  • Jun 14, 2023
  • Siwei Wang + 5 more

Automated test generation has been extensively studied for dynamically compiled or typed programming languages like Java and Python. However, Go, a popular statically compiled and typed programming language for server application development, has received limited support from existing tools. To address this gap, we present NxtUnit, an automatic unit test generation tool for Go that uses random testing and is well-suited for microservice architecture. NxtUnit employs a random approach to generate unit tests quickly, making it ideal for smoke testing and providing quick quality feedback. It comes with three types of interfaces: an integrated development environment (IDE) plugin, a command-line interface (CLI), and a browser-based platform. The plugin and CLI tool allow engineers to write unit tests more efficiently, while the platform provides unit test visualization and asynchronous unit test generation. We evaluated NxtUnit by generating unit tests for 13 open-source repositories and 500 ByteDance in-house repositories, resulting in a code coverage of 20.74% for in-house repositories. We conducted a survey among Bytedance engineers and found that NxtUnit can save them 48% of the time on writing unit tests. We have made the CLI tool available at https://github.com/bytedance/nxt_unit.

  • Book Chapter
  • 10.1007/978-3-642-04288-1_8
Effects on Unit Tests – Preliminary Analysis
  • Oct 30, 2009
  • Lech Madeyski

With the rising acceptance of XP, and agile methodologies in general, a growing number of software projects develop and maintain large test suites. Tests are considered a kind of a live documentation for the production code, because tests are always kept in sync with the code as opposed to typical text-based documentation which may not be in sync with the code. However, the first and foremost tests are used to execute a program with the intent of finding errors [191]. Hence, not only the thoroughness of developed tests (which is often taken into consideration by means of code coverage measures) but also the fault detection effectiveness of developed tests play the key part in software development. As explained in Sect. 3.3.2.4, code coverage measures can be useful as indicators of the thoroughness of unit test suites [170], while mutation score is a more powerful and more effective measure of the fault finding effectiveness of test suites than statement and branch coverage , and data-flow [82 ,197]. Unfortunately, the empirical evidence on the impact of the TF practice on unit test suite characteristics is limited to code coverage. Therefore, preliminary results, presented in this section, extend the body of knowledge in software engineering by means of the analysis of the impact of the TF practice on mutation score indicator (an indicator of the fault detection effectiveness of developed unit tests).

  • Research Article
  • 10.1002/stvr.443
Editorial: Agility must be good for testing
  • Sep 1, 2010
  • Software Testing, Verification and Reliability
  • Jeff Offutt

Editorial: Agility must be good for testing

  • Research Article
  • Cite Count Icon 82
  • 10.1109/ms.2006.117
Unit tests reloaded: parameterized unit testing with symbolic execution
  • Jul 1, 2006
  • IEEE Software
  • N Tillmann + 1 more

Unit tests are becoming popular. Are there ways to automate the generation of good unit tests? Parameterized unit tests are unit tests that depend on inputs. PUTs describe behavior more concisely than traditional unit tests. We use symbolic execution techniques and constraint solving to find inputs for PUTs that achieve high code coverage, to turn existing unit tests into PUTs, and to generate entirely new PUTs that describe an existing implementation's behavior. Traditional testing benefits from these techniques because test inputs - including the behavior of entire classes - can often be generated automatically from compact PUTs

  • Conference Article
  • Cite Count Icon 4
  • 10.1109/icicos56336.2022.9930600
Understandable Automatic Generated Unit Tests using Semantic and Format Improvement
  • Sep 28, 2022
  • Novi Setiani + 2 more

Unit testing is the important yet the most laborious testing activity because the developer must create and execute unit tests for each class that is created. Unit tests can be created manually by the developer or automatically by using code-based test case generation techniques, such as random, search-based, or symbolic execution techniques. The automatic generated testcases reducing developer effort in writing the unit test for every method and class. However, this automated unit test case more difficult to understand compared to manual unit test. After the unit tests are executed, the results must be validated or maintained by the developer team and they must understand the content of unit tests. This applies not only to the development phase but also to the software maintenance phase. Therefore, understandability is an important aspect that test cases need to have. To explore what kind of test cases are easy for developers to understand, a requirement gathering activities related to understandability improvement in unit test cases is conducted. This research involved the developers and experts in software development by exploring their opinion in semantic, format and function of automatic unit test cases. Based on expert's opinion and developer's interview, requirement list is mapped to the Evosuite and Randoop generated unit test case

  • Research Article
  • Cite Count Icon 1
  • 10.21609/jiki.v17i1.1198
Implementation Genetic Algorithm for Optimization of Kotlin Software Unit Test Case Generator
  • Feb 25, 2024
  • Jurnal Ilmu Komputer dan Informasi
  • Mohammad Andiez Satria Permana + 2 more

Unit testing has a significant role in software development and its impacts depend on the quality of test cases and test data used. To reduce time and effort, unit test generator systems can help automatically generate test cases and test data. However, there is currently no unit test generator for Kotlin programming language even though this language is popularly used for android application developments. In this study, we propose and develop a test generator system that utilizes genetic algorithm (GA) and ANTLR4 parser. GA is used to obtain the most optimal test cases and data for a given Kotlin code. ANTLR4 parser is used to optimize the mutation process in GA so that the mutation process is not totally random. Our model results showed that the average value of code coverage in generated unit tests against instruction coverage is 95.64%, with branch coverage of 76.19% and line coverage of 96.87%. In addition, only two out of eight generated classes produced duplicate test cases with a maximum of one duplication in each class. Therefore, it can be concluded that our optimization with GA on the unit test generator is able to produce unit tests with high code coverage and low duplication.

  • Conference Article
  • Cite Count Icon 27
  • 10.1145/3643795.3648396
Unit Test Generation using Generative AI : A Comparative Performance Analysis of Autogeneration Tools
  • Apr 20, 2024
  • Shreya Bhatia + 3 more

Generating unit tests is a crucial task in software development, demanding substantial time and effort from programmers. The advent of Large Language Models (LLMs) introduces a novel avenue for unit test script generation. This research aims to experimentally investigate the effectiveness of LLMs, specifically exemplified by ChatGPT, for generating unit test scripts for Python programs, and how the generated test cases compare with those generated by an existing unit test generator (Pynguin). For experiments, we consider three types of code units: 1) Procedural scripts, 2) Function-based modular code, and 3) Class-based code. The generated test cases are evaluated based on criteria such as coverage, correctness, and readability. Our results show that ChatGPT's performance is comparable with Pynguin in terms of coverage, though for some cases its performance is superior to Pynguin. We also find that about a third of assertions generated by ChatGPT for some categories were incorrect. Our results also show that there is minimal overlap in missed statements between ChatGPT and Pynguin, thus, suggesting that a combination of both tools may enhance unit test generation performance. Finally, in our experiments, prompt engineering improved ChatGPT's performance, achieving a much higher coverage.*These authors contributed equally.

  • Book Chapter
  • Cite Count Icon 34
  • 10.1007/978-3-642-18070-5_13
JMLUnit: The Next Generation
  • Jan 1, 2011
  • Daniel M Zimmerman + 1 more

Designing unit test suites for object-oriented systems is a painstaking, repetitive, and error-prone task, and significant research has been devoted to the automatic generation of test suites. One method for generating unit tests is to use formal class and method specifications as test oracles and automatically run them with developer-provided data values; for Java code with formal specifications written in the Java Modeling Language, this method is embodied in the JMLUnit tool and the JUnit testing framework on which it is based. While JMLUnit can provide reasonable test coverage when used by a skilled developer, it suffers from several shortcomings including excessive memory utilization during testing and the need to manually write significant amounts of code to generate non-primitive test data objects. In this paper we describe JMLUnitNG, a TestNG-based successor to JMLUnit that can automatically generate and execute millions of tests, using supplied test data of only primitive types, without consuming excessive amounts of memory. We also present a comparison of test coverage between JMLUnitNG and the original JMLUnit.KeywordsTest SuiteSystem Under TestSymbolic ExecutionJava Modeling LanguageTest ExecutionThese keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Save Icon
Up Arrow
Open/Close
Notes

Save Important notes in documents

Highlight text to save as a note, or write notes directly

You can also access these Documents in Paperpal, our AI writing tool

Powered by our AI Writing Assistant