Abstract

Millions of vulnerable consumer IoT devices in home networks are the enabler for cyber crimes putting user privacy and Internet security at risk. Internet service providers (ISPs) are best poised to mitigate risks by automatically inferring active IoT devices per household and notifying users of vulnerable ones. Developing a scalable inference method that can perform robustly across thousands of home networks is a non-trivial task. This paper focuses on the challenges of developing and applying datadriven inference models when labeled data of device behaviors is limited and the distribution of data changes across time and space domains (concept drifts). Our contributions are fourfold: (1) We collect and analyze more than six million network traffic flows of 24 types of consumer IoT devices from 12 real homes over six weeks to highlight the challenge of temporal and spatial concept drifts in network behaviors of IoT devices -we publicly release our training and testing instances data; (2) We analyze the performance of two inference strategies, namely “global inference” (a model trained on a combined set of all labeled data from training homes) and “contextualized inference” (several models each trained on the labeled data from a training home) in the presence of concept drifts; (3) To manage concept drifts, we develop a method that dynamically applies the “best” model (from a set) to network traffic of unseen homes during the testing phase, yielding better performance in a fifth of scenarios when the labels are available for the testing data (ideal but unrealistic settings); and (4) We develop a method to automatically select the best model without needing labels of unseen data (a realistic inference) and show that it can achieve 94% of the ideal model’s accuracy.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call