SUMMARY This letter proposes a mobile application frameworknamed erasable photograph tagging (EPT) for photograph annotation andfast retrieval. The smartphone owner’s voice is employed as tags and hid-den in the host photograph without an extra feature database aided for re-trieval. These digitized tags can be erased anytime with no distortion re-maining in the recovered photograph. key words: smartphone application, speech processing, reversible datahiding, robust hashing, image retrieval 1. Introduction In recent years, the widely use of smartphones and the rapiddevelopmentofmobileapplications(appsforshort)marketsmutually accelerate each other. These apps refer to the soft-ware applications including games, banking services, order-tracking, ticket purchases, and so on. Nowadays, a varietyofmobileappscanbedownloadedfromaspecifiedplatformand installed to a target device. Its popularity and usageis still becoming increasingly prevalent across smartphoneusers.This letter proposes a novel mobile app frameworknamed erasable photograph tagging (EPT). It is developedfor photograph annotation and retrieval based on smart-phone owner’s voice. In particular, the owner’s voice is ac-cepted via the recorder first, then digitized into a bit-streamtag, and finally imperceptibly hidden in the current capturedphotograph. The novelty lies in that the speech informationusedasinvisibletagscanbeeasilyerased,andtherecoveredphotograph is still exactly the same as its original version.Only the smartphone’s camera, recorder and photograph al-bum are required to realize the EPT framework. From an-other point of view, this work can be regarded as a kind ofcross-media analysis which is a new emerging research areain current multimedia research.