ABSTRACTDeep learning (DL) systems are becoming increasingly widely used in safety domains such as self‐driving cars and unmanned aerial vehicles, which arouse natural concerns about their trustworthiness. Underlying DL libraries used in the construction and execution of DL models are involved in the testing processes of DL systems. Therefore, bugs in DL libraries can inevitably cause unexpected behaviours in DL systems. The internal structures of DL libraries are described as APIs with different functionalities, and DL libraries offer model developers access to DL techniques with various API parameter settings. The above characteristics of DL libraries reveal that existing DL coverage criteria are not designed specifically for DL libraries, and traditional software coverage criteria do not apply to DL libraries either. The paper introduces the first set of coverage criteria specifically designed for the systematic measurement of DL libraries across various granularities. APIs, as the fundamental components of DL libraries, are used to define coverage criteria gauging testing adequacy by thoroughly considering their invocation, implementation, parameter quantities and parameter attributes. Furthermore, some properties depicting relations between coverage criteria are investigated. Experiments on the effectiveness of the proposed coverage criteria and comparative analysis are conducted by interval estimate and hypothesis testing techniques for APIs in two well‐known DL libraries. The experimental results demonstrate that the proposed coverage criteria are effective in measuring the test adequacy of DL libraries, and they can be used for the quantitative analysis of test model quality in DL libraries.
Read full abstract