Perceptual image hashing refers to a class of algorithms that produce content-based image hashes. These systems use specialized perceptual hash algorithms like Phash, Microsoft’s PhotoDNA, or Facebook’s PDQ to generate a compact digest of an image file that can be roughly compared to a database of known illicit-content digests. Time taken by perceptual hashing algorithms to generate hash code has been computed. Then, we evaluated perceptual hashing algorithms on two million dataset of images. The produced nine variants of the original images were computed and then several distances were calculated. There have been several studies in the past, but in the existing literature size of the data is small and there are very few studies with hash code computation time and robustness tradeoff. This work shows that existing perceptual hashing algorithms are robust for most of the content-preserving operations and there is a tradeoff between computation time and robustness.
Read full abstract