Abstract
The establishment of large video affective content analysis datasets, such as LIRIS-ACCEDE, opens up the possibility of utilizing the massive representation power of deep neural networks (DNNs) to model the complex process of eliciting affective responses from video viewers. However, label noise in these datasets poses a considerable challenge to both the training and evaluation of DNNs. The optimization of DNNs requires stochastic gradient descent (SGD), but label noise in the training set leads to an inaccurate estimate of the gradient, which may cause the model to converge to a nonoptima. In addition, label noise in the test set renders the results of model evaluation untrustworthy. In this article, we propose a multimodal deep quality embedding network (MMDQEN) for affective video content analysis. Specifically, MMDQEN can infer the latent label and label quality from the noisy training samples so that cleaner supervision signals are provided to the DNN-based affective classifier, and a tractable objective for MMDQEN is derived with variational inference and conditional independence assumption. In addition, to avoid model evaluation bias incurred by the annotation noise in the test set, new test sets based on the original LIRIS-ACCEDE database, which we name LIRIS-ACCEDE-RANK, are established where the samples are ranked according to their label uncertainty level, with corresponding evaluation metrics introduced accordingly to further reveal the performance of different models. Experiments conducted on both the LIRIS-ACCEDE and the LIRIS-ACCEDE-RANK datasets demonstrate the effectiveness of the proposed method.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.