Given a million escort advertisements, how can we spot near-duplicates? Such micro-clusters of ads are usually signals of human trafficking (HT). How can we summarize them to convince law enforcement to act? Spotting micro-clusters of near-duplicate documents is useful in multiple, additional settings, including spam-bot detection in Twitter ads, plagiarism, and more. We present InfoShield , which makes the following contributions: practical , being scalable and effective on real data; parameter-free and principled , requiring no user-defined parameters; interpretable , finding a document to be the cluster representative, highlighting all the common phrases, and automatically detecting “slots” (i.e., phrases that differ in every document); and generalizable , beating or matching domain-specific methods in Twitter bot detection and HT detection, respectively, as well as being language independent. Interpretability is particularly important for the anti-HT domain, where law enforcement must visually inspect ads. Our experiments on real data show that InfoShield correctly identifies Twitter bots with an F1 score over 90% and detects HT ads with 84% precision. Moreover, it is scalable, requiring about 8 hours for 4 million documents on a stock laptop. Our incremental version, DeltaShield , allows for fast, incremental updates, with minor loss of accuracy.