Enabling Cost-Effective UI Automation Testing with Retrieval-Based LLMs: A Case Study in WeChat
UI automation tests play a crucial role in ensuring the quality of mobile applications. Despite the growing popularity of machine learning techniques to generate these tests, they still face several challenges, such as the mismatch of UI elements. The recent advances in Large Language Models (LLMs) have addressed these issues by leveraging their semantic understanding capabilities. However, a significant gap remains in applying these models to industrial-level app testing, particularly in terms of cost optimization and knowledge limitation. To address this, we introduce CAT to create cost-effective UI automation tests for industry apps by combining machine learning and LLMs with best practices. Given the task description, CAT employs Retrieval Augmented Generation (RAG) to source examples of industrial app usage as the few-shot learning context, assisting LLMs in generating the specific sequence of actions. CAT then employs machine learning techniques, with LLMs serving as a complementary optimizer, to map the target element on the UI screen. Our evaluations on the WeChat testing dataset demonstrate the CAT's performance and cost-effectiveness, achieving 90% UI automation with $0.34 cost, outperforming the state-of-the-art. We have also integrated our approach into the real-world WeChat testing platform, demonstrating its usefulness in detecting 141 bugs and enhancing the developers' testing process.
- Research Article
2
- 10.9734/ajrcos/2025/v18i3590
- Feb 19, 2025
- Asian Journal of Research in Computer Science
Aim: The study examines Continuous Testing (CT) in a DevOps environment for cloud migration within the Property & Casualty (P&C) insurance industry and InsurTech companies. The study evaluates the impact of AI/ML-driven test automation, security testing, and performance validation using tools like Selenium like Selenium, JUnit, and TestNG; CI/CD pipelines such as Jenkins, GitHub Actions, and Azure DevOps. Experimental testing with comparative evaluations shows that a CT structured approach simplifies any cloud migration project and improves defect propagation and compliance ratios in regulated industries. Industry and Scientific Application: Beyond insurance, this study applies to other industries and scientific research. Continuous Testing drives innovation by detecting real-time defects, reducing deployment risks, and ensuring regulatory compliance. These findings can serve as a model for organizations integrating DevOps-driven testing in cloud migration. Benefits of Cloud Migration: CT in DevOps optimizes cloud migration by automating testing, reducing errors, and improving deployment efficiency. Companies using CT see 40-60% faster deployments and 35% fewer defects post-implementation. Automated security checks enhance compliance, while test automation lowers costs. These benefits make CT essential for a smooth, secure, and cost-effective cloud transition, especially in regulated sectors like finance, healthcare, and insurance. Case Studies and Real-World Application: Case studies of Liberty Mutual and Progressive Insurance have shown that Continuous Testing is effective and accelerates DevOps-centered cloud migration. Liberty Mutual used cloud-embedded automated test frameworks to reduce the release time period by 50% and achieve compliance, thereby cutting time-to-market. Progressive Insurance streamlined the testing of APIs and mobile applications, using CI/CD-integrated testing automation to produce faster claims processing at the rate of up to 30% and a drop of around 90% in API failures. Here, in these case studies, one sees how Continuous Testing massively contributes to deployment efficiency, system resilience, and adherence to regulations in real world case applications in insurance. Study Design: "This study takes a mixed methods approach that incorporates case studies, industry surveys, and experimental testing to assess the efficiency of Continuous Testing in cloud-migration strategies," instead. Its research targets insurance companies which have recently adopted DevOps-driven cloud migration and investigate their testing frameworks. Place and Duration of Study: This study is based on a review of industry practices, integration strategies and analysis of cloud migration strategies in global insurance firms across various companies in North America and Asia-Pacific, focusing on solutions implemented between 2018 and 2024. Methodology: The study employs a multiple research method approach, including reviews of the literature, case studies, surveys, and experimentation, to assess the impact of continuous testing (CT) for the DevOps-driven cloud migration of P&C insurers. Following a detailed literature review about the extant state of the research into CT and cloud adoption in insurance, case studies are available to demonstrate insurance URLs using CT-based frameworks. Surveys and interviews with IT and DevOps in-house experts underline the challenges and good practices. Through experimentation with automated testing tools such as Selenium, JUnit, and Jenkins, we measure the improvements in efficiency. A comparative analysis will measure the performance indicators prior to and after the CT implementation. Results: Continuous testing (CT) substantially enhances cloud-migration efficiency for P&C insurance. Companies that have had CT in their version of DevOps have executed a 40-60% increase in software release cycles, leading to faster deployments. Automated testing dragged post-deployment issues down by 35%, thereby increasing the reliability of the software. Compliance with industry requirements was much better because continuous security checks lessen risks. Another benefit included a reduction of about 20-30% in testing costs due to automation that replaced human testing. On top of this, the way for more applications to be resilient to system failures was opened; applications were supported and maintained. The post-migration data should always specify that applications have 99.9% up time. In the heavily regulated insurance sector, continuous testing thus becomes a much faster, more secure, and cost-effective measure for moving to the cloud. Conclusion: Continuous Testing in DevOps significantly enhances cloud migration for P&C insurers by improving speed-to-market, quality, and compliance. According to this discussion, automation is the key enabler for cloud adoption, which in turn mitigates risk and improves operational agility. One could advocate that P&C insurers' ability to pursue a CT division in the cloud epoch is tantamount to ensuring the unremitting progress of digital transformation.
- Conference Article
142
- 10.1109/icst.2011.11
- Mar 1, 2011
This paper presents experiences in model-based graphical user interface testing of Android applications. We present how model-based testing and test automation was implemented with Android, including how applications were modeled, how tests were designed and executed, and what kind of problems were found in the tested application during the whole process. The main focus is on a case study that was performed with an Android application, the BBC News Widget. Our goal is to present actual data on the experiences and to discuss if advantages can be gained using model-based testing when compared with traditional graphical user interface testing. Another contribution of this paper is a description of a keyword-based test automation tool that was implemented for the Android emulator during the case study. All the models and the tools created or used in this case study are available as open source.
- Research Article
2
- 10.1515/icom-2023-0029
- Nov 9, 2023
- i-com
Poor software quality results in avoidable costs of trillion dollars annually in the United States alone. Augmented Reality (AR) applications are a relatively new software category. Currently there are no standards to guide the development process and testing is predominantly ad hoc and manual. Consequently, design guidelines and software test automation techniques are intended to remedy the situation. Here, we present a concept for test automation of AR applications. The concept consists of two parts: design guidelines and process model for testing AR applications, and a case study with a prototype application for test automation. The design guidelines and the process model are based on the state-of-the-art. The prototype application presented in this article demonstrates test automation for a multi-platform AR application for Android devices as well as the HoloLens 2. The presented test automation case study is designed to cover a large part of the functions, such as the different interaction variants. This research work shows that by using the proposed process model and test automation techniques, testing of some features of AR applications can be automated. The results of this research can serve as a basis for future research and contribution towards AR application development standardization efforts.
- Research Article
63
- 10.14573/altex.1408041
- Nov 5, 2014
- ALTEX
SEURAT-1 is a European public-private research consortium that is working towards animal-free testing of chemical compounds and the highest level of consumer protection. A research strategy was formulated based on the guiding principle to adopt a toxicological mode-of-action framework to describe how any substance may adversely affect human health.The proof of the initiative will be in demonstrating the applicability of the concepts on which SEURAT-1 is built on three levels:(i) Theoretical prototypes for adverse outcome pathways are formulated based on knowledge already available in the scientific literature on investigating the toxicological mode-of-actions leading to adverse outcomes (addressing mainly liver toxicity);(ii)adverse outcome pathway descriptions are used as a guide for the formulation of case studies to further elucidate the theoretical model and to develop integrated testing strategies for the prediction of certain toxicological effects (i.e., those related to the adverse outcome pathway descriptions);(iii) further case studies target the application of knowledge gained within SEURAT-1 in the context of safety assessment. The ultimate goal would be to perform ab initio predictions based on a complete understanding of toxicological mechanisms. In the near-term, it is more realistic that data from innovative testing methods will support read-across arguments. Both scenarios are addressed with case studies for improved safety assessment. A conceptual framework for a rational integrated assessment strategy emerged from designing the case studies and is discussed in the context of international developments focusing on alternative approaches for evaluating chemicals using the new 21st century tools for toxicity testing.
- Research Article
3
- 10.53735/cisse.v9i1.148
- Mar 8, 2022
- Journal of The Colloquium for Information Systems Security Education
Teaching college students ethical hacking skills is considered a necessary component of a computer security curriculum and an effective method for teaching defensive techniques. However, there is a shortage of textbooks and technical papers that describe the teaching materials and implementation of penetration testing techniques for hands-on exercises. In our teaching practice, we have been using case studies and course projects as a means to help students learn the fundamental concepts of, primary techniques and commonly used tools for penetration testing. We think this is a beneficiary complement of a cybersecurity course that is taught in a defensive approach. Through these activities, students have gained hands-on experience and developed their ethical hacking skills. Feedback from them is positive and student learning outcomes are promising. In this paper, we describe the principles of developing and implementing case studies and course projects along with associated considerations for specified educational objectives when introducing penetration test. An example case study and course project that we have been using in our courses are described to introduce the major design ideas and activities to complete them. Experience, lessons and the feedback from students are discussed. Our results will provide a good point of reference for those educators who teach a cybersecurity course at a college or university and would like to offer an introduction to ethical hacking. This work can also be a reference for a college that wants to integrate
- Research Article
19
- 10.1016/j.infsof.2016.08.008
- Aug 24, 2016
- Information and Software Technology
Full modification coverage through automatic similarity-based test case selection
- Book Chapter
2
- 10.1007/978-3-319-26396-0_2
- Dec 18, 2015
When developing a computer (mobile) game, testing the game is an important task and takes a large share of the development cost. So far, testing a game’s functional features relies mainly on human testers, who personally plays the game, and is thus labor intensive. This paper proposes a method that automates game testing. Since games are usually built on top of game frameworks, the idea is to enhance a game framework with a testing layer, which can execute (playback) test scripts that perform user events and assert the correctness of the game. We design an HTML5 game framework with such a support. In addition, a case study is performed to compare the testing cost of three different methods: writing a test script directly, recording a test script, and testing the game directly by a human tester. The results showed that when repeated testings are necessary, an automatic testing (either writing or recording test scripts) can reduce human cost. Among these three testing methods, recording scripts was the most favored method.
- Supplementary Content
1
- 10.17605/osf.io/4adp8
- Oct 30, 2021
- arXiv (Cornell University)
Educational portal (EP) is a multi-function website that allows access to activities such as public and private sections, data retrieval and submission, personalized content and so on for the educational system. This study investigated the specific requirement for the enhancement of quality and behavior of EP with regards to time and cost using Obafemi Awolowo University (OAU), Ile-Ife, Nigeria as a case study. A test automation framework was designed using unified modelling language and implemented in Java programming language. MySQL and Excel database were used to store test data. The framework developed was evaluated using Test Time Performance (TTP), Performance Test Efficiency (PTE) and Automation Scripting Productivity (ASP) metrics. The results from the evaluation of the sample data provided showed that ASP produced a tested outcome of 360 operations per hour, PTE yielded 80% and TTP was just 4%. Based on the recorded performance, it is evident that the research can provide quick and firsthand information to quality assurance analyst and software testers, thereby reducing maintenance cost during software development.
- Supplementary Content
- 10.48550/arxiv.2111.00222
- Aug 30, 2021
- arXiv (Cornell University)
Educational portal (EP) is a multi-function website that allows access to activities such as public and private sections, data retrieval and submission, personalized content and so on for the educational system. This study investigated the specific requirement for the enhancement of quality and behavior of EP with regards to time and cost using Obafemi Awolowo University (OAU), Ile-Ife, Nigeria as a case study. A test automation framework was designed using unified modelling language and implemented in Java programming language. MySQL and Excel database were used to store test data. The framework developed was evaluated using Test Time Performance (TTP), Performance Test Efficiency (PTE) and Automation Scripting Productivity (ASP) metrics. The results from the evaluation of the sample data provided showed that ASP produced a tested outcome of 360 operations per hour, PTE yielded 80% and TTP was just 4%. Based on the recorded performance, it is evident that the research can provide quick and firsthand information to quality assurance analyst and software testers, thereby reducing maintenance cost during software development.
- Research Article
2
- 10.11588/heidok.00017171
- Jan 1, 2014
- heiDOK (Heidelberg University)
The quality assurance of scientific software has to deal with special challenges of this type of software, including missing test oracles, the need for high performance computing, and the high priority of non-functional requirements. A scientific framework consists of common code, which provides solutions for several similar mathematical problems. The various possible uses of a scientific framework lead to a large variability in the framework. In addition to the challenges of scientific software, the quality assurance of a scientific framework needs to find a way of dealing with the large variability. In software product line engineering (SPLE), the idea is to develop a software platform and then use mass customization for the creation of a group of similar applications. In this thesis, we show how SPLE, in particular variability modeling, can be applied to support the quality assurance of scientific frameworks. One of the main contributions of this thesis is a process for the creation of reengineering variability models for a scientific framework based on its mathematical requirements. Reengineering means the adjustment of a software system to improve the software quality, mostly without changing the software’s functionality. In our research, the variability models are created for existing software and therefore we call them reengineering variability models. The created variability models are used for a systematic development of system test applications for the framework. Additionally, we developed a model-based method for test case derivation for the system test applications based on the variability models. Furthermore, we contribute a software product line test strategy for scientific frameworks. A test strategy strongly influences the test activities performed. Another main contribution of this thesis is the design of a quality assurance process for scientific frameworks, which combines the test activities of the test strategy with other quality assurance activities. We introduce a list of special characteristics for scientific software, which we use as rationale for the design of this process. We report on a case study, analyzing the feasibility and acceptance by developers for two parts of the design of the quality assurance process: variability model creation and desk-checking, a kind of lightweight review. Using FeatureIDE, an environment for feature-oriented software development as well as an automated test environment, we prototypically demonstrate the applicability of our approach.
- Conference Article
4
- 10.1109/tase52547.2021.00029
- Aug 1, 2021
Requirements-based testing is one of the most commonly used ways to ensure the correctness of software, especially for embedded control software in safety-critical domains such as spacecraft and railway systems. Many industrial standards such as the DO-333 and EN50128 also request rigorous requirements-based software testing. To test embedded control software effectively and efficiently, generating high-quality test cases automatically is extremely important. However, existing methods for generating test cases from requirements require intensive manual efforts and expertise. To address this problem, we proposed an automatic requirements-based software testing method for embedded control software. To obtain automatic test case generation and precise test oracles derivation, requirements specification should be precise and readable for the industrial practitioners. Therefore, we use the light-weight domain-specific formal description language, CASDL (Casco Accurate Specification Description Language) for the industrial practitioners to define software requirements into formal specifications at the first step. Based on the formal specification, we propose an algorithm to automatically generate test inputs that satisfy the MC/DC criteria suggested by typical industrial standards and precise test oracles can be derived by "running" the specification with such test inputs. To this end, we proposed an algorithm for simulating the formal specification to generate the test oracles, i.e., the expected outputs corresponding to the test inputs. To facilitate the application of this method in the industry, we have built a tool that can automatically perform the overall testing process. To validate and evaluate its effectiveness in real industrial projects, we have applied it in testing a real Automatic Train Protection (ATP) system provided by our industrial partner, the Casco Signal Co., Ltd (one of the largest railway control system companies in China). In the case study on ATP requirements, our approach generated test cases for 129 requirement items following MC/DC criteria and caught 40 inconsistencies between Casco's requirements and implementation.
- Conference Article
- 10.1109/iccece52344.2021.9534854
- Aug 16, 2021
The research study focus is in investigating different case studies and then proposing an improvement for optimisation of a testing methodology. The study investigates, analyses and provides an overview of different testing tools. Based on the "TestBench" testing tool, it has shown positive performance gain in the optimisation process. It offers both manual testing and automatic testing. Selenium, another tool used for automatic testing was also investigated. To choose the right tool for automated testing, it is very important and crucial for the proper and appropriate testing process. The research study contributes by presenting a case study analyses of the tools and scrutiny of the performance through a detailed investigation of the correctness of testing using these tools. A regression analysis was made in order to test the validity of the factors included in the case study, as well as its efficiency.
- Research Article
- 10.2298/csis170701006a
- Jan 1, 2018
- Computer Science and Information Systems
When creating a test automation infrastructure, one of the main considerations for the buildup process is its efficiency. A main cause and method for improvement might come from reuse of test automation artifacts. Following that, one may ask ?To what extent can the test automation artifacts be re-used??. In this paper we present a model and test automation architecture for achieving such a goal. Repository Driven Test Automation (RDTA) is a conceptual approach for the buildup process of test automation infrastructure that employs reuse of testing artifacts. This paper discusses aspects of reuse of software test automation artifacts on various levels. Then, practical implications and adjustments arising from the implementation of this new paradigm are discussed. The proposed concept is documented by a case study in an international innovative computer hardware manufacturer, one of leaders in the market. The documented results are significant and confirm the validity of the concept.
- Research Article
9
- 10.5121/ijscai.2024.13101
- Feb 28, 2024
- International Journal on Soft Computing, Artificial Intelligence and Applications
This scholarly article delves into the intersection of Artificial Intelligence (AI) and Test Automation, thoroughly examining the challenges inherent in implementing AI methodologies and elucidating imperatives critical for successful integration within contemporary software testing frameworks. The research entails a comprehensive exploration of challenges, ranging from intricacies in data quality to algorithmic biases, tool complexities, and integration challenges, drawing on empirical evidence from case studies and real-world scenarios. The paper articulates imperatives essential for overcoming challenges and ensuring the efficacy of AI in test automation. It emphasizes the significance of structured training programs, meticulous data management strategies, and the cultivation of an organizational culture conducive to the seamless integration of AI technologies. Through a rigorous analysis of successful case studies, the article provides a scholarly basis for the formulation of strategies and solutions to surmount challenges faced by organizations adopting AI in testing practices. A visual matrix aligning challenges with corresponding imperatives adds scholarly rigor to the article, offering a comprehensive framework for understanding the intricate relationships between challenges and the imperative strategies required for resolution. Furthermore, the exploration of emerging trends and innovations anticipates the future trajectory of AI-driven test automation, contributing valuable insights for strategic planning in the realm of software testing. This scholarly work underscores the importance of a systematic and informed approach to AI in Test Automation. By addressing challenges with academic rigor and embracing imperative strategies grounded in empirical evidence, organizations can position themselves at the forefront of AIdriven testing practices, advancing the field with a scholarly foundation for continued exploration and innovation.
- Conference Article
4
- 10.1109/ase.1999.802262
- Oct 12, 1999
This paper presents results of a joint case study of Ericsson and the German cellular network provider Mannesmann Mobilfunk, targeted at automating type acceptance tests. Faced with a growing number of tests required to verify the quality of the telecom switch software, both companies seek to improve testing efficiency by means of test automation. In a joint effort, a test platform originally created by Ericsson for supporting statistical usage tests was enhanced with features designed by Mannesmann. The platform was then employed to automate existing test instructions and the gain of test automation was measured in all phases, from implementation through execution under type acceptance conditions to final analysis. This paper presents the results and conclusions of this case study.