Deep Reinforcement Learning for Black-box Testing of Android Apps

Andrea Romdhana,Paolo Tonella,Alessio Merlo,Mariano Ceccato

doi:10.1145/3502868

Abstract

The state space of Android apps is huge, and its thorough exploration during testing remains a significant challenge. The best exploration strategy is highly dependent on the features of the app under test. Reinforcement Learning (RL) is a machine learning technique that learns the optimal strategy to solve a task by trial and error, guided by positive or negative reward, rather than explicit supervision. Deep RL is a recent extension of RL that takes advantage of the learning capabilities of neural networks. Such capabilities make Deep RL suitable for complex exploration spaces such as one of Android apps. However, state-of-the-art, publicly available tools only support basic, Tabular RL. We have developed ARES, a Deep RL approach for black-box testing of Android apps. Experimental results show that it achieves higher coverage and fault revelation than the baselines, including state-of-the-art tools, such as TimeMachine and Q-Testing. We also investigated the reasons behind such performance qualitatively, and we have identified the key features of Android apps that make Deep RL particularly effective on them to be the presence of chained and blocking activities. Moreover, we have developed FATE to fine-tune the hyperparameters of Deep RL algorithms on simulated apps, since it is computationally expensive to carry it out on real apps.

Highlights

The complexity of mobile applications keeps growing, as apps always provide more advanced services to the users
Experimental results confirmed the hypothesis that Deep Reinforcement Learning (RL) outperforms Tabular RL in exploring the state space of Android apps, as ARES exposed the highest number of faults and obtained the highest code coverage
We carried out a qualitative analysis showing that the features of Android apps that make Deep RL adequate include, among others, the presence of concatenated activities and blocking activities protected by authentication

Summary

Introduction

The complexity of mobile applications (hereafter, apps) keeps growing, as apps always provide more advanced services to the users. Automated testing of mobile apps is still an open problem, and the complexity of current apps makes their exploration trickier than in the past, as they can contain states that are difficult to reach and events that are hard to trigger. Structural strategies [7, 16, 35] generate coverage-oriented inputs using symbolic execution or evolutionary algorithms. These strategies are more powerful, since a specific coverage target guides them. They do not take advantage of past exploration successes to dynamically learn the most compelling exploration strategy. The agent receives an observation xt , takes an action at that causes the transition of the environment from state st to state st+1. The agent receives a scalar reward R(xt , at , xt+1), that quantifies the goodness of the last transition

Objectives

Results

Conclusion