Abstract

Nowadays, people generate and share massive amounts of content on online platforms (e.g., social networks, blogs). In 2021, the 1.9 billion daily active Facebook users posted around 150 thousand photos every minute. Content moderators constantly monitor these online platforms to prevent the spreading of inappropriate content (e.g., hate speech, nudity images). Based on deep learning (DL) advances, Automatic Content Moderators (ACM) help human moderators handle high data volume. Despite their advantages, attackers can exploit weaknesses of DL components (e.g., preprocessing, model) to affect their performance. Therefore, an attacker can leverage such techniques to spread inappropriate content by evading ACM.In this work, we analyzed 4600 potentially toxic Instagram posts, and we discovered that 44% of them adopt obfuscations that might undermine ACM. As these posts are reminiscent of captchas (i.e., not understandable by automated mechanisms), we coin this threat as Captcha Attack (CAPA). Our contributions start by proposing a CAPA taxonomy to better understand how ACM is vulnerable to obfuscation attacks. We then focus on the broad sub-category of CAPA using textual Captcha Challenges, namely CC-CAPA, and we empirically demonstrate that it evades real-world ACM (i.e., Amazon, Google, Microsoft) with 100% accuracy. Our investigation revealed that ACM failures are caused by the OCR text extraction phase. The training of OCRs to withstand such obfuscation is therefore crucial, but huge amounts of data are required. Thus, we investigate methods to identify CC-CAPA samples from large sets of data (originated by three OSN – Pinterest, Twitter, Yahoo-Flickr), and we empirically demonstrate that supervised techniques identify target styles of samples almost perfectly. Unsupervised solutions, on the other hand, represent a solid methodology for inspecting uncommon data to detect new obfuscation techniques.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call