Unit tests reloaded: parameterized unit testing with symbolic execution

  • TL;DR
  • Abstract
  • Literature Map
  • Similar Papers
TL;DR

This paper explores automating the generation of parameterized unit tests using symbolic execution and constraint solving to achieve high code coverage, convert existing tests into PUTs, and create new PUTs that accurately describe implementation behavior, enhancing testing efficiency and conciseness.

Abstract
Translate article icon Translate Article Star icon

Unit tests are becoming popular. Are there ways to automate the generation of good unit tests? Parameterized unit tests are unit tests that depend on inputs. PUTs describe behavior more concisely than traditional unit tests. We use symbolic execution techniques and constraint solving to find inputs for PUTs that achieve high code coverage, to turn existing unit tests into PUTs, and to generate entirely new PUTs that describe an existing implementation's behavior. Traditional testing benefits from these techniques because test inputs - including the behavior of entire classes - can often be generated automatically from compact PUTs

Similar Papers
  • Conference Article
  • Cite Count Icon 17
  • 10.1145/2889160.2891056
Advances in unit testing
  • May 14, 2016
  • Tao Xie + 2 more

Parameterized unit testing, recent advances in unit testing, is a new methodology extending the previous industry practice based on traditional unit tests without parameters. A parameterized unit test (PUT) is simply a test method that takes parameters, calls the code under test, and states assertions. Parameterized unit testing allows the separation of two testing concerns or tasks: the specification of external, black-box behavior (i.e., assertions or specifications) by developers and the generation and selection of internal, white-box test inputs (i.e., high-code-covering test inputs) by tools. PUTs have been supported by various testing frameworks. Various open source and industrial testing tools also exist to generate test inputs for PUTs. This technical briefing presents latest research on principles and techniques, as well as practical considerations to apply parameterized unit testing on real-world programs, highlighting success stories, research and education achievements, and future research directions in developer testing.

  • Conference Article
  • 10.4230/lipics.ecoop.2018.5
A characteristic study of parameterized unit tests in .NET open source projects
  • Jul 1, 2018
  • DROPS (Schloss Dagstuhl – Leibniz Center for Informatics)
  • Wing Lam + 6 more

In the past decade, parameterized unit testing has emerged as a promising method to specify program behaviors under test in the form of unit tests. Developers can write parameterized unit tests (PUTs), unit-test methods with parameters, in contrast to conventional unit tests, without parameters. The use of PUTs can enable powerful test generation tools such as Pex to have strong test oracles to check against, beyond just uncaught runtime exceptions. In addition, PUTs have been popularly supported by various unit testing frameworks for .NET and the JUnit framework for Java. However, there exists no study to offer insights on how PUTs are written by developers in either proprietary or open source development practices, posing barriers for various stakeholders to bring PUTs to widely adopted practices in software industry. To fill this gap, we first present categorization results of the Microsoft MSDN Pex Forum posts (contributed primarily by industrial practitioners) related to PUTs. We then use the categorization results to guide the design of the first characteristic study of PUTs in .NET open source projects. We study hundreds of PUTs that open source developers wrote for these open source projects. Our study findings provide valuable insights for various stakeholders such as current or prospective PUT writers (e.g., developers), PUT framework designers, test-generation tool vendors, testing researchers, and testing educators.

  • Book Chapter
  • Cite Count Icon 33
  • 10.1007/978-3-642-13977-2_8
DyGen: Automatic Generation of High-Coverage Tests via Mining Gigabytes of Dynamic Traces
  • Jan 1, 2010
  • Suresh Thummalapenta + 3 more

Unit tests of object-oriented code exercise particular sequences of method calls. A key problem when automatically generating unit tests that achieve high structural code coverage is the selection of relevant method-call sequences, since the number of potentially relevant sequences explodes with the number of methods. To address this issue, we propose a novel approach, called DyGen, that generates tests via mining dynamic traces recorded during program executions. Typical program executions tend to exercise only happy paths that do not include error-handling code, and thus recorded traces often do not achieve high structural coverage. To increase coverage, DyGen transforms traces into parameterized unit tests (PUTs) and uses dynamic symbolic execution to generate new unit tests for the PUTs that can achieve high structural code coverage. In this paper, we show an application of DyGen by automatically generating regression tests on a given version of software.

  • Research Article
  • Cite Count Icon 62
  • 10.1145/1095430.1081749
Parameterized unit tests
  • Sep 1, 2005
  • ACM SIGSOFT Software Engineering Notes
  • Nikolai Tillmann + 1 more

Parameterized unit tests extend the current industry practice of using closed unit tests defined as parameterless methods. Parameterized unit tests separate two concerns: 1) They specify the external behavior of the involved methods for all test arguments. 2) Test cases can be re-obtained as traditional closed unit tests by instantiating the parameterized unit tests. Symbolic execution and constraint solving can be used to automatically choose a minimal set of inputs that exercise a parameterized unit test with respect to possible code paths of the implementation. In addition, parameterized unit tests can be used as symbolic summaries which allows symbolic execution to scale for arbitrary abstraction levels. We have developed a prototype tool which computes test cases from parameterized unit tests. We report on its first use testing parts of the .NET base class library.

  • Conference Article
  • Cite Count Icon 222
  • 10.1145/1081706.1081749
Parameterized unit tests
  • Sep 1, 2005
  • Nikolai Tillmann + 1 more

Parameterized unit tests extend the current industry practice of using closed unit tests defined as parameterless methods. Parameterized unit tests separate two concerns: 1) They specify the external behavior of the involved methods for all test arguments. 2) Test cases can be re-obtained as traditional closed unit tests by instantiating the parameterized unit tests. Symbolic execution and constraint solving can be used to automatically choose a minimal set of inputs that exercise a parameterized unit test with respect to possible code paths of the implementation. In addition, parameterized unit tests can be used as symbolic summaries which allows symbolic execution to scale for arbitrary abstraction levels. We have developed a prototype tool which computes test cases from parameterized unit tests. We report on its first use testing parts of the .NET base class library.

  • Book Chapter
  • Cite Count Icon 29
  • 10.1007/978-3-642-19811-3_21
Retrofitting Unit Tests for Parameterized Unit Testing
  • Jan 1, 2011
  • Suresh Thummalapenta + 4 more

Recent advances in software testing introduced parameterized unit tests (PUT), which accept parameters, unlike conventional unit tests (CUT), which do not accept parameters. PUTs are more beneficial than CUTs with regards to fault-detection capability, since PUTs help describe the behaviors of methods under test for all test arguments. In general, existing applications often include manually written CUTs. With the existence of these CUTs, natural questions that arise are whether these CUTs can be retrofitted as PUTs to leverage the benefits of PUTs, and what are the cost and benefits involved in retrofitting CUTs as PUTs. To address these questions, in this paper, we conduct an empirical study to investigate whether existing CUTs can be retrofitted as PUTs with feasible effort and achieve the benefits of PUTs in terms of additional fault-detection capability and code coverage. We also propose a methodology, called test generalization, that helps in systematically retrofitting existing CUTs as PUTs. Our results on three real-world open-source applications (≈ 4.6 KLOC) show that the retrofitted PUTs detect 19 new defects that are not detected by existing CUTs, and also increase branch coverage by 4% on average (with maximum increase of 52% for one class under test and 10% for one application under analysis) with feasible effort.

  • Conference Article
  • Cite Count Icon 4
  • 10.1109/icicos56336.2022.9930600
Understandable Automatic Generated Unit Tests using Semantic and Format Improvement
  • Sep 28, 2022
  • Novi Setiani + 2 more

Unit testing is the important yet the most laborious testing activity because the developer must create and execute unit tests for each class that is created. Unit tests can be created manually by the developer or automatically by using code-based test case generation techniques, such as random, search-based, or symbolic execution techniques. The automatic generated testcases reducing developer effort in writing the unit test for every method and class. However, this automated unit test case more difficult to understand compared to manual unit test. After the unit tests are executed, the results must be validated or maintained by the developer team and they must understand the content of unit tests. This applies not only to the development phase but also to the software maintenance phase. Therefore, understandability is an important aspect that test cases need to have. To explore what kind of test cases are easy for developers to understand, a requirement gathering activities related to understandability improvement in unit test cases is conducted. This research involved the developers and experts in software development by exploring their opinion in semantic, format and function of automatic unit test cases. Based on expert's opinion and developer's interview, requirement list is mapped to the Evosuite and Randoop generated unit test case

  • Conference Article
  • Cite Count Icon 1
  • 10.1109/kse.2009.47
A Parameterized Unit Test Framework Based on Symbolic Java PathFinder
  • Oct 1, 2009
  • Anh-Hoang Truong + 1 more

Parameterized unit test recently gains a lot of attention as it saves testing cost and is more efficient in term of code coverage. We present a framework for running parameterized unit tests (PUT) based on Java PathFinder (JPF) and JUnit. Our approach bases on model checking and symbolic execution of JPF for generating standard unit tests. As a result, we achieve high structural and path coverage. The generated unit tests are automatically executed by JUnit so programmers receive immediately assertion failures if any. Currently, our approach mainly works with numeric and boolean data type but it is possible to extend our framework for other data types such as string.

  • Research Article
  • Cite Count Icon 1
  • 10.1145/3765758
Reference-Based Retrieval-Augmented Unit Test Generation
  • Dec 3, 2025
  • ACM Transactions on Software Engineering and Methodology
  • Zhe Zhang + 5 more

Automated unit test generation has been widely studied, with Large Language Models (LLMs) recently showing significant potential. LLMs like GPT-4, trained in vast text and code data, excel in various code-related tasks, including unit test generation. However, existing LLM-based approaches often focus solely on the context within the code itself, such as referenced variables, while neglecting broader task-specific contexts, such as the utility of referring to existing tests of relevant methods in unit test generation. Moreover, in the context of unit test generation, these tools prioritize high code coverage, often at the expense of practical usability, correctness, and maintainability. In response, we propose Reference-Based Retrieval Augmentation , a novel mechanism that extends LLM-based Retrieval-Augmented Generation (RAG) to retrieve relevant information by considering task-specific context. In the unit test generation task, for a given focal method, the reference relationships is defined as the reusability or referentiality of tests between the focal method and other methods. To generate high-quality unit tests for the focal method, the test reference relationships are then used to retrieve relevant methods and their existing unit tests. Specifically, we account for the unique structure of unit tests by dividing the test generation process into Given , When , and Then phases. When generating unit tests for a focal method, we retrieve pre-existing tests of other relevant methods, which can provide valuable insights for any of the Given , When , and Then phases. We implement this approach in a tool called RefTest , which sequentially performs preprocessing, test reference retrieval, and unit test generation, using an incremental strategy in which newly generated tests guide the creation of subsequent ones. We evaluated RefTest on 12 open-source projects with 1515 methods, and the results demonstrate that RefTest consistently outperforms existing tools in terms of correctness, completeness, and maintainability of the generated tests.

  • Research Article
  • Cite Count Icon 1
  • 10.21609/jiki.v17i1.1198
Implementation Genetic Algorithm for Optimization of Kotlin Software Unit Test Case Generator
  • Feb 25, 2024
  • Jurnal Ilmu Komputer dan Informasi
  • Mohammad Andiez Satria Permana + 2 more

Unit testing has a significant role in software development and its impacts depend on the quality of test cases and test data used. To reduce time and effort, unit test generator systems can help automatically generate test cases and test data. However, there is currently no unit test generator for Kotlin programming language even though this language is popularly used for android application developments. In this study, we propose and develop a test generator system that utilizes genetic algorithm (GA) and ANTLR4 parser. GA is used to obtain the most optimal test cases and data for a given Kotlin code. ANTLR4 parser is used to optimize the mutation process in GA so that the mutation process is not totally random. Our model results showed that the average value of code coverage in generated unit tests against instruction coverage is 95.64%, with branch coverage of 76.19% and line coverage of 96.87%. In addition, only two out of eight generated classes produced duplicate test cases with a maximum of one duplication in each class. Therefore, it can be concluded that our optimization with GA on the unit test generator is able to produce unit tests with high code coverage and low duplication.

  • Conference Article
  • Cite Count Icon 36
  • 10.1109/icse43902.2021.00138
Automatic Unit Test Generation for Machine Learning Libraries: How Far Are We?
  • May 1, 2021
  • Song Wang + 5 more

Automatic unit test generation that explores the input space and produces effective test cases for given programs have been studied for decades. Many unit test generation tools that can help generate unit test cases with high structural coverage over a program have been examined. However, the fact that existing test generation tools are mainly evaluated on general software programs calls into question about its practical effectiveness and usefulness for machine learning libraries, which are statistically orientated and have fundamentally different nature and construction from general software projects. In this paper, we set out to investigate the effectiveness of existing unit test generation techniques on machine learning libraries. To investigate this issue, we conducted an empirical study on five widely used machine learning libraries with two popular unit testcase generation tools, i.e., EVOSUITE and Randoop. We find that (1) most of the machine learning libraries do not maintain a high-quality unit test suite regarding commonly applied quality metrics such as code coverage (on average is 34.1%) and mutation score (on average is 21.3%), (2) unit test case generation tools, i.e., EVOSUITE and Randoop, lead to clear improvements in code coverage and mutation score, however, the improvement is limited, and (3) there exist common patterns in the uncovered code across the five machine learning libraries that can be used to improve unit test case generation tasks.

  • Research Article
  • Cite Count Icon 75
  • 10.1007/s10851-006-8530-6
Tool-assisted unit-test generation and selection based on operational abstractions
  • Jul 1, 2006
  • Automated Software Engineering
  • Tao Xie + 1 more

Unit testing, a common step in software development, presents a challenge. When produced manually, unit test suites are often insufficient to identify defects. The main alternative is to use one of a variety of automatic unit-test generation tools: these are able to produce and execute a large number of test inputs that extensively exercise the unit under test. However, without a priori specifications, programmers need to manually verify the outputs of these test executions, which is generally impractical. To reduce this cost, unit-test selection techniques may be used to help select a subset of automatically generated test inputs. Then programmers can verify their outputs, equip them with test oracles, and put them into the existing test suite. In this paper, we present the operational violation approach for unit-test generation and selection, a black-box approach without requiring a priori specifications. The approach dynamically generates operational abstractions from executions of the existing unit test suite. These operational abstractions guide test generation tools to generate tests to violate them. The approach selects those generated tests violating operational abstractions for inspection. These selected tests exercise some new behavior that has not been exercised by the existing tests. We implemented this approach by integrating the use of Daikon (a dynamic invariant detection tool) and Parasoft Jtest (a commercial Java unit testing tool), and conducted several experiments to assess the approach.

  • Research Article
  • 10.3390/e28010074
IGTG&R: An Intent Analysis-Guided Unit Test Generation and Refinement Framework
  • Jan 9, 2026
  • Entropy
  • Xiaojian Liu + 1 more

Code coverage-guided unit test generation (CGTG) and large language model-based test generation (LLMTG) are two principal approaches for the generation of unit tests. Each of these approaches has its inherent advantages and drawbacks. Tests generated by CGTG have been shown to exhibit high code coverage and high executability. However, they lack the capacity to comprehend code intent, which results in an inability to identify deviations between code implementation and design intent (i.e., functional defects). Conversely, although LLMTG demonstrates an advantage in terms of code intent analysis, it is generally characterized by low executability and necessitates iterative debugging. In order to enhance the ability of unit test generation to identify functional defects, a novel framework has been proposed, entitled the intent analysis-guided unit test generation and refinement (IGTG&R) model. The IGTG&R model consists of a two-stage process for test generation. In the first stage, we introduce coverage path entropy to enhance CGTG to achieve high executability and code coverage of test cases. The second stage refines the test cases using LLMs to identify functional defects. We quantify and verify the interference of incorrect code implementation on intent analysis through conditional entropy. In order to reduce this interference, the focal method body is excluded from the code context information during intent analysis. Using these two-stage process, IGTG&R achieves a more profound comprehension of the intent of the code and the identification of functional defects. The IGTG&R model has been demonstrated to achieve an identification rate of functional defects ranging from 65% to 89%, with an execution success rate of 100% and a code coverage rate of 75.8%. This indicates that IGTG&R is superior to the CGTG and LLMTG approaches in multiple aspects.

  • Research Article
  • Cite Count Icon 80
  • 10.1145/3660783
Evaluating and Improving ChatGPT for Unit Test Generation
  • Jul 12, 2024
  • Proceedings of the ACM on Software Engineering
  • Zhiqiang Yuan + 6 more

Unit testing plays an essential role in detecting bugs in functionally-discrete program units ( e.g. , methods). Manually writing high-quality unit tests is time-consuming and laborious. Although the traditional techniques are able to generate tests with reasonable coverage, they are shown to exhibit low readability and still cannot be directly adopted by developers in practice. Recent work has shown the large potential of large language models (LLMs) in unit test generation. By being pre-trained on a massive developer-written code corpus, the models are capable of generating more human-like and meaningful test code. In this work, we perform the first empirical study to evaluate the capability of ChatGPT ( i.e ., one of the most representative LLMs with outstanding performance in code generation and comprehension) in unit test generation. In particular, we conduct both a quantitative analysis and a user study to systematically investigate the quality of its generated tests in terms of correctness, sufficiency, readability, and usability. We find that the tests generated by ChatGPT still suffer from correctness issues, including diverse compilation errors and execution failures (mostly caused by incorrect assertions); but the passing tests generated by ChatGPT almost resemble manually-written tests by achieving comparable coverage, readability, and even sometimes developers’ preference. Our findings indicate that generating unit tests with ChatGPT could be very promising if the correctness of its generated tests could be further improved. Inspired by our findings above, we further propose ChatTester , a novel ChatGPT-based unit test generation approach, which leverages ChatGPT itself to improve the quality of its generated tests. Chat Tester incorporates an initial test generator and an iterative test refiner. Our evaluation demonstrates the effectiveness of ChatTester by generating 34.3 % more compilable tests and 18.7 % more tests with correct assertions than the default ChatGPT. In addition to ChatGPT, we further investigate the generalization capabilities of ChatTester by applying it to two recent open-source LLMs ( i.e. , CodeLlama-Instruct and CodeFuse) and our results show that ChatTester can also improve the quality of tests generated by these LLMs.

  • Research Article
  • Cite Count Icon 10
  • 10.1002/stvr.1838
JUGE: An infrastructure for benchmarking Java unit test generators
  • Dec 20, 2022
  • Software Testing, Verification and Reliability
  • Xavier Devroey + 6 more

SummaryResearchers and practitioners have designed and implemented various automated test case generators to support effective software testing. Such generators exist for various languages (e.g., Java, C#, or Python) and various platforms (e.g., desktop, web, or mobile applications). The generators exhibit varying effectiveness and efficiency, depending on the testing goals they aim to satisfy (e.g., unit‐testing of libraries versus system‐testing of entire applications) and the underlying techniques they implement. In this context, practitioners need to be able to compare different generators to identify the most suited one for their requirements, while researchers seek to identify future research directions. This can be achieved by systematically executing large‐scale evaluations of different generators. However, executing such empirical evaluations is not trivial and requires substantial effort to select appropriate benchmarks, setup the evaluation infrastructure, and collect and analyse the results. In this Software Note, we present ourJUnit Generation Benchmarking Infrastructure(JUGE) supporting generators (search‐based, random‐based, symbolic execution, etc.) seeking to automate the production of unit tests for various purposes (validation, regression testing, fault localization, etc.). The primary goal is to reduce the overall benchmarking effort, ease the comparison of several generators, and enhance the knowledge transfer between academia and industry by standardizing the evaluation and comparison process. Since 2013, several editions of a unit testing tool competition, co‐located with the Search‐Based Software Testing Workshop, have taken place whereJUGEwas used and evolved. As a result, an increasing amount of tools (over 10) from academia and industry have been evaluated onJUGE, matured over the years, and allowed the identification of future research directions. Based on the experience gained from the competitions, we discuss the expected impact ofJUGEin improving the knowledge transfer on tools and approaches for test generation between academia and industry. Indeed, theJUGEinfrastructure demonstrated an implementation design that is flexible enough to enable the integration of additional unit test generation tools, which is practical for developers and allows researchers to experiment with new and advanced unit testing tools and approaches.

Save Icon
Up Arrow
Open/Close
Notes

Save Important notes in documents

Highlight text to save as a note, or write notes directly

You can also access these Documents in Paperpal, our AI writing tool

Powered by our AI Writing Assistant