Abstract
AbstractAlthough deep learning algorithms can achieve high performance, deep models may not learn the right concepts and can easily overfit their training datasets. In the context of IoT devices, the problem is further exacerbated by three factors. First, traffic may be encrypted, allowing very little visibility into the activity of the endpoints. Second, devices with different models and manufacturers may exhibit very different behaviors. Finally, contrary to domains like computer vision or natural language processing, there is no well-accepted representation for the network data that characterizes IoT devices. In this work, we capture real network traffic from different environments, and we demonstrate that training models to detect specific classes of IoT devices (e.g., cameras) using state-of-the-art techniques can lead to overfitting, and very poor performance on independent datasets. However, we then show that by applying domain knowledge, one can manually define engineered features and train simple models (e.g., a decision tree) that achieve an F-1 score of 0.956 on an independent dataset. These results show the feasibility of training generalizable models, but at the same time, raise questions on how best to transform and represent the raw network data to train classifiers for other classes of IoT devices (e.g., hubs, motion sensors) while minimizing manual feature engineering. We elaborate on the challenges, drawing analogies with other fields such as natural language processing.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.