Boosting Metamorphic Testing: A General Metamorphic Specification Language and A Supporting System

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

Metamorphic testing (MT) is a black-box testing technique to alleviate the oracle problem via leveraging Metamorphic Relations (MRs) based on the domain knowledge of the software under test. In recent years, software testing researchers have made significant research advancements on MT in fundamental theories (e.g., MR identification and composition), methodologies (e.g., test input generation), and fault detection effectiveness in various application domains. However, there still exist some major challenges of MT yet to be addressed, such as the needs for a general MR description language, and an automated system that supports all major steps of MT and integrates the various MT tasks. To address these challenges, we have developed a general MR description language (called the Category-Choice Metamorphic specification Language; abbreviated as CCML), through which an automated supporting tool (called the Category-Choice Metamorphic testing Tool; abbreviated as CCMT) integrating the various MT tasks has been built. CCMT supports the automatic generation of test inputs and composite MRs and integrates various run-time optimization strategies. We have also conducted empirical studies to evaluate the expressiveness of CCML and the performance of CCMT in various testing aspects. Overall, our empirical findings are encouraging, and have demonstrated the merits of CCML and CCMT. In this regard, our work contributes to improving the fault detection effectiveness, efficiency, and practicality of MT and, hence, brings the use of MT to a new height.

Similar Papers
  • Conference Article
  • Cite Count Icon 20
  • 10.1145/2896971.2896977
The impact of source test case selection on the effectiveness of metamorphic testing
  • May 14, 2016
  • Arlinta Christy Barus + 4 more

Metamorphic Testing (MT) aims to alleviate the oracle problem. In MT, testers define metamorphic relations (MRs) which are used to generate new test cases (referred to as follow-up test cases) from the available test cases (referred to as source test cases). Both source and follow-up test cases are executed and their outputs are verified against the relevant MRs, of which any violation implies that the software under test is faulty. So far, the research on the effectiveness of MT has been focused on the selection of better MRs (that is, MRs that are more likely to be violated). In addition to MR selection, the source and follow-up test cases may also affect the effectiveness of MT. Since follow-up test cases are defined by the source test cases and MRs, selection of source test cases will then affect the effectiveness of MT. However, in existing MT studies, random testing is commonly adopted as the test case selection strategy for source test cases. This study aims to investigate the impact of source test cases on the effectiveness of MT. Since Adaptive Random Testing (ART) has been developed as an enhancement to Random Testing (RT), this study will focus on comparing the performance of RT and ART as source test case selection strategies on the effectiveness of MT. Experiment results show that ART outperforms RT on enhancing the effectiveness of MT.

  • Research Article
  • Cite Count Icon 11
  • 10.1016/j.infsof.2020.106507
Validating class integration test order generation systems with Metamorphic Testing
  • Dec 16, 2020
  • Information and Software Technology
  • Miao Zhang + 3 more

Validating class integration test order generation systems with Metamorphic Testing

  • Conference Article
  • Cite Count Icon 8
  • 10.1145/2896971.2896980
Generating source inputs for metamorphic testing using dynamic symbolic execution
  • May 14, 2016
  • Eman Alatawi + 2 more

Metamorphic testing uses domain-specific properties about a program’s intended behaviour to alleviate the oracle problem. From a given set of source test inputs, a set of follow- up test inputs are generated which have some relation to the source inputs, and their outputs are compared to outputs from the source tests, using metamorphic relations. We evaluate the use of an automated test input generation technique called dynamic symbolic execution (DSE) to generate the source test inputs for metamorphic testing. We investigate whether DSE increases source-code coverage and fault finding effectiveness of metamorphic testing compared to the use of random testing, and whether the use of metamorphic relations as a supportive technique improves the test inputs generated by DSE. Our results show that DSE improves the coverage and fault detection rate of metamorphic testing compared to random testing using significantly smaller test suites, and the use of metamorphic relations increases code coverage of both DSE and random tests considerably, but the improvement in the fault detection rate may be marginal and depends on the used metamorphic relations.

  • Research Article
  • Cite Count Icon 9
  • 10.1504/ijwgs.2020.110945
An iterative metamorphic testing technique for web services and case studies
  • Jan 1, 2020
  • International Journal of Web and Grid Services
  • Chang Ai Sun + 6 more

Metamorphic testing (MT) is an innovative approach to alleviating the oracle problem in software testing, which uses metamorphic relations of the program under test, instead of the test oracles, to verify its outputs. To alleviate the oracle problem of testing web services, we had previously proposed an MT framework for web services. In this paper, we further improve the efficiency and automation of this framework by leveraging metamorphic relations to iteratively generate test cases. We present a fixed-size iterative MT algorithm and implement it in the MT framework. We conduct three case studies to evaluate the fault detection effectiveness and efficiency of the proposed approach. Experimental results suggest that, compared with the conventional MT, iterative MT can achieve a comparable fault detection effectiveness, but with significantly fewer resources. Observations and limitations are summarised to provide new insights into the application of iterative MT.

  • Conference Article
  • Cite Count Icon 21
  • 10.1109/aitest.2019.00019
Fault Detection Effectiveness of Metamorphic Relations Developed for Testing Supervised Classifiers
  • Apr 1, 2019
  • Prashanta Saha + 1 more

In machine learning, supervised classifiers are used to obtain predictions for unlabeled data by inferring prediction functions using labeled data. Supervised classifiers are widely applied in domains such as computational biology, computational physics and healthcare to make critical decisions. However, it is often hard to test supervised classifiers since the expected answers are unknown. This is commonly known as the oracle problem and metamorphic testing (MT) has been used to test such programs. In MT, metamorphic relations (MRs) are developed from intrinsic characteristics of the software under test (SUT). These MRs are used to generate test data and to verify the correctness of the test results without the presence of a test oracle. Effectiveness of MT heavily depends on the MRs used for testing. In this paper we have conducted an extensive empirical study to evaluate the fault detection effectiveness of MRs that have been used in multiple previous studies to test supervised classifiers. Our study uses a total of 709 reachable mutants generated by multiple mutation engines and uses data sets with varying characteristics to test the SUT. Our results reveal that only 14.8% of these mutants are detected using the MRs and that the fault detection effectiveness of these MRs do not scale with the increased number of mutants when compared to what was reported in previous studies.

  • Conference Article
  • Cite Count Icon 41
  • 10.1145/3180155.3182528
Metamorphic testing of RESTful web APIs
  • May 27, 2018
  • Sergio Segura + 3 more

Web Application Programming Interfaces (APIs) specify how to access services and data over the network, typically using Web services. Web APIs are rapidly proliferating as a key element to foster reusability, integration, and innovation, enabling new consumption models such as mobile or smart TV apps. Companies such as Facebook, Twitter, Google, eBay or Netflix receive billions of API calls every day from thousands of different third-party applications and devices, which constitutes more than half of their total traffic. As Web APIs are progressively becoming the cornerstone of software integration, their validation is getting more critical. In this context, the fast detection of bugs is of utmost importance to increase the quality of internal products and third-party applications. However, testing Web APIs is challenging mainly due to the difficulty to assess whether the output of an API call is correct, i.e., the oracle problem. For instance, consider the Web API of the popular music streaming service Spotify. Suppose a search for albums with the query redhouse returning 21 total matches: Is this output correct? Do all the albums in the result set contain the keyword? Are there any albums containing the keyword not included in the result set? Answering these questions is difficult, even with small result sets, and often infeasible when the results are counted by thousands or millions. Metamorphic testing alleviates the oracle problem by providing an alternative when the expected output of a test execution is complex or unknown. Rather than checking the output of an individual program execution, metamorphic testing checks whether multiple executions of the program under test fulfil certain necessary properties called metamorphic relations. For instance, consider the following metamorphic relation in Spotify: two searches for albums with the same query should return the same number of total results regardless of the size of pagination. Suppose that a new Spotify search is performed using the exact same query as before and increasing the maximum number of results per page from 20 (default value) to 50: This search returns 27 total albums (6 more matches than in the previous search), which reveals a bug. This is an example of a real and reproducible fault detected using the approach presented in this paper and reported to Spotify. According to Spotify developers, it was a regression fault caused by a fix with undesired side effects. In this paper [1], we present a metamorphic testing approach for the automated detection of faults in RESTful Web APIs (henceforth also referred to as simply Web APIs). We introduce the concept of metamorphic relation output patterns. A Metamorphic Relation Output Pattern (MROP) defines an abstract output relation typically identified in Web APIs, regardless of their application domain. Each MROP is defined in terms of set operations among test outputs such as equality, union, subset, or intersection. MROPs provide a helpful guide for the identification of metamorphic relations, broadening the scope of our work beyond a particular Web API. Based on the notion of MROP, a methodology is proposed for the application of the approach to any Web API following the REST architectural pattern. The approach was evaluated in several steps. First, we used the proposed methodology to identify 33 metamorphic relations in four Web APIs developed by undergraduate students. All the relations are instances of the proposed MROPs. Then, we assessed the effectiveness of the identified relations at revealing 317 automatically seeded faults (i.e., mutants) in the APIs under test. As a result, 302 seeded faults were detected, achieving a mutation score of 95.3%. Second, we evaluated the approach using real Web APIs and faults. In particular, we identified 20 metamorphic relations in the Web API of Spotify and 40 metamorphic relations in the Web API of YouTube. Each metamorphic relation was implemented and automatically executed using both random and manual test data. In total, 469K metamorphic tests were generated. As a result, 21 metamorphic relations were violated, and 11 issues revealed and reported (3 issues in Spotify and 8 issues in YouTube). To date, 10 of the reported issues have been either confirmed by the API developers or reproduced by other users supporting the effectiveness of our approach.

  • Research Article
  • 10.1145/3728972
Improving Deep Learning Framework Testing with Model-Level Metamorphic Testing
  • Jun 22, 2025
  • Proceedings of the ACM on Software Engineering
  • Yanzhou Mu + 8 more

Deep learning (DL) frameworks are essential to DL-based software systems, and framework bugs may lead to substantial disasters, thus requiring effective testing. Researchers adopt DL models or single interfaces as test inputs and analyze their execution results to detect bugs. However, floating-point errors, inherent randomness, and the complexity of test inputs make it challenging to analyze execution results effectively, leading to existing methods suffering from a lack of suitable test oracles. Some researchers utilize metamorphic testing to tackle this challenge. They design Metamorphic Relations (MRs) based on input data and parameter settings of a single framework interface to generate equivalent test inputs, ensuring consistent execution results between original and generated test inputs. Despite their promising effectiveness, they still face certain limitations. (1) Existing MRs overlook structural complexity, limiting test input diversity. (2) Existing MRs focus on limited interfaces, which limits generalization and necessitates additional adaptations. (3) Their detected bugs are related to the result consistency of single interfaces and far from those exposed in multi-interface combinations and runtime metrics (e.g., resource usage). To address these limitations, we propose ModelMeta, a model-level metamorphic testing method for DL frameworks with four MRs focused on the structure characteristics of DL models. ModelMeta augments seed models with diverse interface combinations to generate test inputs with consistent outputs, guided by the QR-DQN strategy. It then detects bugs through fine-grained analysis of training loss/gradients, memory/GPU usage, and execution time. We evaluate the effectiveness of ModelMeta on three popular DL frameworks (i.e., MindSpore, PyTorch, and ONNX) with 17 DL models from ten real-world tasks ranging from image classification to object detection. Results demonstrate that ModelMeta outperforms state-of-the-art baselines from the perspective of test coverage and diversity of generated test inputs. Regarding bug detection, ModelMeta has identified 31 new bugs, of which 27 have been confirmed, and 11 have been fixed. Among them, seven bugs existing methods cannot detect, i.e., five wrong resource usage bugs and two low-efficiency bugs. These results demonstrate the practicality of our method.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 1
  • 10.3390/jimaging10040087
Measuring Effectiveness of Metamorphic Relations for Image Processing Using Mutation Testing.
  • Apr 6, 2024
  • Journal of Imaging
  • Fakeeha Jafari + 1 more

Testing an intricate plexus of advanced software system architecture is quite challenging due to the absence of test oracle. Metamorphic testing is a popular technique to alleviate the test oracle problem. The effectiveness of metamorphic testing is dependent on metamorphic relations (MRs). MRs represent the essential properties of the system under test and are evaluated by their fault detection rates. The existing techniques for the evaluation of MRs are not comprehensive, as very few mutation operators are used to generate very few mutants. In this research, we have proposed six new MRs for dilation and erosion operations. The fault detection rate of six newly proposed MRs is determined using mutation testing. We have used eight applicable mutation operators and determined their effectiveness. By using these applicable operators, we have ensured that all the possible numbers of mutants are generated, which shows that all the faults in the system under test are fully identified. Results of the evaluation of four MRs for edge detection show an improvement in all the respective MRs, especially in MR1 and MR4, with a fault detection rate of 76.54% and 69.13%, respectively, which is 32% and 24% higher than the existing technique. The fault detection rate of MR2 and MR3 is also improved by 1%. Similarly, results of dilation and erosion show that out of 8 MRs, the fault detection rates of four MRs are higher than the existing technique. In the proposed technique, MR1 is improved by 39%, MR4 is improved by 0.5%, MR6 is improved by 17%, and MR8 is improved by 29%. We have also compared the results of our proposed MRs with the existing MRs of dilation and erosion operations. Results show that the proposed MRs complement the existing MRs effectively as the new MRs can find those faults that are not identified by the existing MRs.

  • Research Article
  • Cite Count Icon 67
  • 10.1016/j.jss.2015.07.037
METRIC: METamorphic Relation Identification based on the Category-choice framework
  • Jul 30, 2015
  • Journal of Systems and Software
  • Tsong Yueh Chen + 2 more

METRIC: METamorphic Relation Identification based on the Category-choice framework

  • Research Article
  • Cite Count Icon 25
  • 10.1109/tse.2020.3009698
Theoretical and Empirical Analyses of the Effectiveness of Metamorphic Relation Composition
  • Jul 21, 2020
  • IEEE Transactions on Software Engineering
  • Kun Qiu + 3 more

Metamorphic Relations (MRs) play a key role in determining the fault detection capability of Metamorphic Testing (MT). As human judgement is required for MR identification, systematic MR generation has long been an important research area in MT. Additionally, due to the extra program executions required for follow-up test cases, some concerns have been raised about MT cost-effectiveness. Consequently, the reduction in testing costs associated with MT has become another important issue to be addressed. MR composition can address both of these problems. This technique can automatically generate new MRs by composing existing ones, thereby reducing the number of follow-up test cases. Despite this advantage, previous studies on MR composition have empirically shown that some composite MRs have lower fault detection capability than their corresponding component MRs. To investigate this issue, we performed theoretical and empirical analyses to identify what characteristics component MRs should possess so that their corresponding composite MR has at least the same fault detection capability as the component MRs do. We have also derived a convenient, but effective guideline so that the fault detection capability of MT will most likely not be reduced after composition.

  • Conference Article
  • Cite Count Icon 10
  • 10.1109/saner56733.2023.00109
Bug or not Bug? Analysing the Reasons Behind Metamorphic Relation Violations
  • Mar 1, 2023
  • Alejandra Duque-Torres + 3 more

Metamorphic Testing (MT) is a testing technique that can effectively alleviate the oracle problem. MT uses Metamorphic Relations (MRs) to determine if a test case passes or fails. MRs specify how the outputs should vary in response to specific input changes when executing the System Under Test (SUT). If a particular MR is violated for at least one test input (and its change), there is a high probability that the SUT has a fault. On the other hand, if a particular MR is not violated, it does not guarantee that the SUT is fault free. However, deciding if the MR is being violated due to a bug or because the MR does not hold/fit for particular conditions generated by specific inputs remains a manual task and unexplored. In this paper, we develop a method for refining MRs to offer hints as to whether a violation results from a bug or arises from the MR not being matched to certain test data under specific circumstances. In our initial proof-of-concept, we derive the relevant information from rules using the Association Rule Mining (ARM) technique. In our initial proof-of-concept, we validate our method on a toy example and discuss the lessons learned from our experiments. Our proof-of-concept demonstrates that our method is applicable and that we can provide suggestions that help strengthen the test suite for regression testing purposes.

  • Conference Article
  • Cite Count Icon 1
  • 10.1109/compsac.2016.48
An Approach for Iteratively Generating Adequate Tests in Metamorphic Testing: A Case Study
  • Jun 1, 2016
  • Junhua Ding + 1 more

Metamorphic testing is an effective technique for testing "non-testable" programs. But the quality of metamorphic testing is highly depended on the selection of metamorphic relations and the test generation. This paper introduces an approach for iteratively developing metamorphic relations and producing adequate tests guided by testing and test evaluation results. The approach includes a framework for the development of metamorphic relations and tests, and a strategy for iteratively refining the relations and tests for generating adequate tests. The test adequacy evaluation is built on the evaluation of test coverage criteria, mutation testing, and testing of mutated metamorphic relations. The approach and its effectiveness are discussed through testing a Monte Carlo modeling program.

  • Conference Article
  • Cite Count Icon 1
  • 10.54941/ahfe1004569
Exploring Metamorphic Testing for Self-learning Functions with User Interactions
  • Jan 1, 2024
  • Marco Stang + 3 more

Self-learning functions, an evolving field in modern technology, are increasingly being integrated into a multitude of applications. They primarily rely on data-driven learning techniques, such as supervised, unsupervised machine learning and reinforcement learning. In the field of autonomous vehicles, self-learning functions are important for real-time decision-making, as they adapt to dynamic scenarios by collecting extensive data from sensors. Likewise, self-learning functions with user-interaction, a subset of self-learning functions, are asserting their influence in the automotive industry, as they observe driver behavior and recognize user-specific interactions with the system. In addition to data-driven learning, these functions incorporate real-time user interactions, such as activating seat heating or ventilation, and autonomously execute these interactions, enhancing overall comfort of the driver. The growing integration and interaction of self-learning functions underscore the importance of conducting research and refining testing methodologies to ensure their reliability and effectiveness. To meet the growing need for trusted and reliable self-learning features, effective testing methods are essential for validating the accuracy and robustness of self-learning functions with user interaction. In contrast to traditional software, self-learning functions change and adjust themselves based on data and interactions. This causes challenges to predict and verify their intended behavior. Furthermore, each potential user exhibits distinctive individual behavior that differentiates them from other users. As a consequence, attempting to address every potential user interaction with traditional testing methods and predefined test case specifications becomes impractical. Moreover, the behavior of a self-learning function adapts over time to that of the respective user. As a result, the user behavior to be tested evolves, rendering traditional testing through predefined test cases unfeasible. Consequently, adapted testing methods are indispensable to effectively address the test-oracle problem. This paper presents a solution to address the test-oracle problem by leveraging metamorphic testing as a method for validating self-learning functions with user interaction. Metamorphic testing approaches the problem of the test-oracle from a perspective not typically employed by other testing strategies: instead of focusing on individual test cases, metamorphic tests examine the outcomes of multiple test cases within a testing system and their relationships with each other. Metamorphic testing assesses whether the test inputs and outputs fulfill specific metamorphic relationships upon multiple test executions. These metamorphic relationships describe the essential properties of the intended functionality. They transform existing input-output test cases into new follow-up test cases. If the behavior of the self-learning functions deviates from the metamorphic relationship in these original and follow-up test cases, the testing system is considered faulted. The effectiveness of fault detection significantly relies on metamorphic relations. Thus, the analysis of metamorphic relations becomes an essential task and a creative endeavor for the tester. Furthermore, it will be a significant contribution of this publication.The proposed paper offers an analysis of the insights gained from the application of metamorphic testing to a self-learning comfort function. This underscores how effective the testing methodology is in identifying inaccuracies in the self-learning function's interpretation of user behaviors, thus contributing to our understanding of their reliability and adaptability in simulated scenarios. In conclusion, the utilization of metamorphic testing in the context of self-learning functions, with a specific emphasis on user interactions, emerges as a promising and efficient strategy.

  • Conference Article
  • Cite Count Icon 4
  • 10.1109/compsac54236.2022.00274
Using Metamorphic Relation Violation Regions to Support a Simulation Framework for the Process of Metamorphic Testing
  • Jun 1, 2022
  • Zhihao Ying + 4 more

Metamorphic testing (MT) has been growing in pop-ularity, but it can still be quite challenging and time-consuming to assess its performance. Typical approaches to performance assessment can require a series of steps, and depend on a variety of factors, often requiring serendipity. This can be a bottleneck for some aspects of MT research. Central to MT, metamorphic relations (MRs) represent necessary properties of the system under test (SUT). In traditional software testing, simulations are often employed to examine and compare the performance of dif-ferent testing strategies. However, these simulations are typically designed based on the assumed availability (and applicability) of a test oracle - a mechanism to decide the correctness of the SUT output or behaviour. A key reason for the popularity of MT is its proven record of effective software testing, without the need for a test oracle. This strength, however, also means that traditional ways of using simulations to analyse software testing approaches are not applicable for MT. This lack of cheap and fast ways to conduct simulation analyses of MT is a hurdle for many aspects of MT research, and may be an obstacle to its more widespread adoption. To address this, in this paper we introduce the concept of MR-violation regions (MRVRs), and show how they can be used for a certain category of MRs, Deterministic MRs (DMRs), to build simulation tools for MT. We analyse the differences between MRVRs and traditional, oracle-defined failure regions; and report on a preliminary case study exploring MRVRs in numerical-input-domain systems from previous MT studies. We anticipate that the proposed MT simulation framework may facilitate more research into MT, and may help lead to its more widespread adoption.

  • Conference Article
  • Cite Count Icon 2
  • 10.1109/seaa56994.2022.00059
Metamorphic Testing in Autonomous System Simulations
  • Aug 1, 2022
  • Jubril Gbolahan Adigun + 2 more

Metamorphic testing has proven to be effective for test case generation and fault detection in many domains. It is a software testing strategy that uses certain relations between input-output pairs of a program, referred to as metamorphic relations. This approach is relevant in the autonomous systems domain since it helps in cases where the outcome of a given test input may be difficult to determine. In this paper therefore, we provide an overview of metamorphic testing as well as an implementation in the autonomous systems domain. We implement an obstacle detection and avoidance task in autonomous drones utilising the GNC API alongside a simulation in Gazebo. Particularly, we describe properties and best practices that are crucial for the development of effective metamorphic relations. We also demonstrate two metamorphic relations for metamorphic testing of single and more than one drones, respectively. Our relations reveal several properties and some weak spots of both the implementation and the avoidance algorithm in the light of metamorphic testing. The results indicate that metamorphic testing has great potential in the autonomous systems domain and should be considered for quality assurance in this field.

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.