AutoFL: Towards AutoML in a Federated Learning Context

Davy Preuveneers

doi:10.3390/app13148019

Abstract

Federated learning (FL) is a decentralized machine learning (ML) technique that learns from distributed data by moving the training process from a centralized server towards many clients rather than centralizing the client data, as is common with classical machine learning. The recent literature on federated learning often focuses on domain-specific use cases (e.g., IoT), investigates various privacy concerns (e.g., membership inference), or analyzes the impact of adversarial attacks (e.g., poisoning) and possible countermeasures. In these works, it is common for the server to have already chosen a specific machine-learning model and predefined hyperparameters prior to initiating the distributed training process. This decision is based on the server’s ability to accomplish the task by either reusing well-established neural network architectures suitable for the specific task (e.g., ResNet-50 for image classification) or evaluating the adequacy of a model using the limited data it has access to. Additionally, the server may also assess publicly available datasets, which may or may not accurately represent real-world data distributions. In this paper, we address the challenge where this step—i.e., the ML model selection and hyperparameter optimization—is not possible in a centralized manner. In such a context, the data of a single client may not be sufficient or not representative enough to construct an ML model configuration that is effective for all clients. In real-world deployments, the data on the different clients may be imbalanced and heterogeneously distributed, and the performance impact of countermeasures is often unclear upfront. While various automated machine learning (AutoML) frameworks have been proposed for classical machine learning and deep learning in a centralized setting, we investigated the practical feasibility of AutoML in a federated learning context while taking into account the presence of security and privacy countermeasures. We implemented and validated our proof-of-concept framework, called AutoFL, on top of open-source libraries for machine learning, federated learning, and hyperparameter optimization, and have demonstrated the added value of our framework with public datasets in different scenarios.

Full Text