Fusing Visual and Textual Information to Determine Content Safety

Rodrigo Leonardo,Amber Hu,Divyaa Ravichandran,Mohammad Uzair,Keishin Nishiyama,Iris Fu,Qiujing Lu,Sooraj Mangalath Subrahmannian

doi:10.1109/icmla.2019.00324

Abstract

In advertising, identifying the content safety of web pages is a significant concern since advertisers do not want brands to be associated with threatening content. At the same time, publishers would like to maximize the number of web pages on which they can place ads. Thus, a fine balance must be achieved while classifying content safety in order to satisfy both advertisers and publishers. In this paper, we propose a multimodal machine learning framework that fuses visual and textual information from web pages to improve current predictions of content safety. The primary focus is on late fusion, which involves combining final model outputs of separate modalities, such as images and text, to arrive at a single decision. This paper presents a fully automated machine learning framework that performs binary and multilabel classification using late fusion techniques. We also introduce additional work in early fusion, which involves extracting and fusing intermediate features from the two separate models. Our algorithms are applied to data extracted from relevant web pages in the advertising industry. Both of our late and early fusion methods obtain significant improvements over algorithms currently in use.

Full Text