Wi-Fi-on devices such as smartphones search for network availability by periodically broadcasting probe requests which encapsulate MAC addresses as device identifiers. To protect identity privacy, modern devices embed random MAC addresses in probe frames, the so-called MAC address randomization. Such randomization disrupts the frame association, inadvertently frustrating identity-oblivious statistical analytic efforts such as people counting and trajectory inference. To address that, we propose <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"/> <bold xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Cappuccino</b> <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"/> , a novel privacy-preserving approach that captures the association of probe requests under MAC address randomization. Cappuccino first estimates pairwise frame correlation and then associates frames over time. For frame correlation, it employs a self-supervised estimator that jointly considers multiple modalities, i.e., information elements, sequence number, and received signal strength. For multiple frame association, Cappuccino formulates frames as a minimum-cost flow optimization. To the best of our knowledge, this is the first piece of work that leverages self-supervised learning to estimate frame correlation based on multiple modalities and formulates the probe request association problem as the network flow optimization. We have conducted extensive experiments in a leading and crowded shopping mall for more than three months. Cappuccino achieves remarkable performance in terms of V-measure scores ( <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$>0.85$</tex-math></inline-formula> ).
Read full abstract