In recent years, the burgeoning data market has witnessed a surge in data exchange, playing a pivotal role in augmenting the predictive and decision-making capabilities of machine learning. Despite these advancements, persistent concerns surrounding data privacy have resulted in stringent limitations on data sharing and trading. Consequently, the data market is undergoing a transformative shift from pricing individual datasets to pricing the models themselves. The challenge of training high-performance machine learning models with restricted data from a single client is substantial. Federated learning has emerged as a popular solution, allowing collaborative model training without the need to transfer client data beyond local environments. However, training federated models within a data market introduces several challenges, including the effective selection of clients for model training and the optimization of utility through model pricing. In response to these challenges, we propose the Tiered Federated Learning Client Selection Algorithm (TiFLCS-MAR), employing a multi-attribute reverse auction approach. Integrated into the federated learning framework, TiFLCS-MAR excels at the comprehensive evaluation of client attributes, employing a tiered strategy to mitigate issues arising from client heterogeneity. Additionally, we introduce the TiFLCS-MAR Pricing Framework (TiFLCS-MARP), leveraging Nash equilibrium principles to maximize profitability for both clients and servers. Our framework accommodates the heterogeneity of diverse clients, efficiently selecting suitable candidates from a large pool, thereby boosting training efficiency and curbing model pricing costs. Empirical evidence showcases the efficacy of federated training with TiFLCS-MAR, demonstrating nearly double the convergence speed and a 5-10 percentage point improvement in accuracy across real and synthetic datasets. Furthermore, when compared to three baseline algorithms, TiFLCS-MARP substantially increases central server revenue by factors of 1.99, 27.05, and 1.78, highlighting its superior performance in the data market context.
Read full abstract