Can we trust AI? towards practical implementation and theoretical analysis in trustworthy machine learning

Kaidi Xu

doi:10.17760/d20413930

Abstract

Deep learning or deep neural networks (DNNs) have achieved extraordinary performance in many application domains such as image classification, object detection and recognition, natural language processing and medical image analysis. It has been well accepted that DNNs are vulnerable to adversarial attacks, which raises concerns of DNNs in security-critical applications and may result in disastrous consequences. Adversarial attacks are usually implemented by generating adversarial examples, i.e., adding sophisticated perturbations onto benign examples, such that adversarial examples are classified by the DNN as target (wrong) labels instead of the correct labels of the benign examples. The adversarial machine learning aims to study this phenomenon and leverage it to build robust machine learning systems and explain DNNs. In this dissertation, we present the mechanism of adversarial machine learning in both empirical and theoretical ways. Specifically, we first introduce a uniform adversarial attack generation framework, structured attack (StrAttack), which explores group sparsity in adversarial perturbations by sliding a mask through images aiming for extracting key spatial structures. Second, we discuss the feasibility of adversarial attack in the physical world and introduce a powerful framework, Expectation over Transformation (EoT). Utilize EoT with Thin Plate Spline (TPS) transformation, we can generate Adversarial T-shirts, a robust physical adversarial example for evading person detectors even if it could undergo non-rigid deformation due to a moving person's pose changes. Third, we stand on the defense side and propose the first adversarial training method based on Graph Neural Network. Fourth, we introduce Linear relaxation based perturbation analysis (LiRPA) for neural networks, which computes the provable linear bounds of output neurons given a certain amount of input perturbation. LiRPA studies the adversarial example in a theoretical way and can guarantee the test accuracy of a model by given perturbation constraints. Finally, leveraging the efficient LiRPA with branch and bound, we speed up the conventional Linear Programming-based complete verification framework by an order of magnitude. In the future, we plan to study on a novel patch transformer network to truthfully model real-world physical transformations empirically. In addition, at the formal robustness direction, we plan to explore the complete verification in real-time, that given sufficient time, the verifier should give a definite "yes/no" answer for a property under verification efficiently. Our LiRPA framework combined with GPUs can accelerate this procedure potentially.--Author's abstract

Full Text