Abstract

Keystroke data collected from smart devices includes various sensitive information about users. Collecting and analyzing such data raise serious privacy concerns. Google and Apple have recently applied local differential privacy (LDP) to address privacy issue on learning new words from users' keystroke data. However, these solutions require multiple LDP reports for a single word, which result in inefficient use of privacy budget and high computational cost. In this paper, we develop a novel algorithm for learning new words under LDP. Unlike the existing solutions, the proposed method generates only one LDP report for a single word. This enables the proposed method to use full privacy budget for generating a report and brings the benefit that the proposed method provides better utility at the same privacy degree than the existing methods. In our algorithm, each user appends a hash value to new word and sends only one LDP report of an n-gram selected randomly from the string packed by each new word and its hash value. The server then decodes frequent n-grams at each position of the string and discovers the candidate words by exploring graph-theoretic links between n-grams and checking integrity of candidates with hash values. Frequencies of frequent new words discovered are estimated from distribution estimates of n-grams by robust regression. We theoretically show that our algorithm can recover popular new words even though the server does not know the domain of the raw data. In addition, we theoretically and empirically demonstrate that our algorithm achieves higher accuracy compared to the existing solutions.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call