Most methods for facial action unit (AU) recognition typically require training images that are fully AU labeled. Manual AU annotation is time intensive. To alleviate this, we propose a novel dual learning framework and apply it to AU detection under two scenarios, that is, semisupervised AU detection with partially AU-labeled and fully expression-labeled samples, and weakly supervised AU detection with fully expression-labeled samples alone. We leverage two forms of auxiliary information. The first is the probabilistic duality between the AU detection task and its dual task, in this case, the face synthesis task given AU labels. We also take advantage of the dependencies among multiple AUs, the dependencies between expression and AUs, and the dependencies between facial features and AUs. Specifically, the proposed method consists of a classifier, an image generator, and a discriminator. The classifier and generator yield face-AU-expression tuples, which are forced to coverage of the ground-truth distribution. This joint distribution also includes three kinds of inherent dependencies: 1) the dependencies among multiple AUs; 2) the dependencies between expression and AUs; and 3) the dependencies between facial features and AUs. We reconstruct the inputted face and AU labels and introduce two reconstruction losses. In a semisupervised scenario, the supervised loss is also incorporated into the full objective for AU-labeled samples. In a weakly supervised scenario, we generate pseudo paired data according to the domain knowledge about expression and AUs. Semisupervised and weakly supervised experiments on three widely used datasets demonstrate the superiority of the proposed method for AU detection and facial synthesis tasks over current works.
Read full abstract