Turning Channel Noise Into an Accelerator for Over-the-Air Principal Component Analysis

Zezhong Zhang,Vincent K N Lau,Rui Wang,Guangxu Zhu,Kaibin Huang

doi:10.1109/twc.2022.3162868

Abstract

The enormous data distributed at the network edge and ubiquitous connectivity have led to the emergence of the new paradigm of distributed machine learning and large-scale data analytics. Distributed principal component analysis (PCA) concerns finding a low-dimensional subspace that contains the most important information of high-dimensional data distributed over the network edge. The subspace is useful for distributed data compression and feature extraction. This work advocates the application of over-the-air federated learning to efficient implementation of distributed PCA in a wireless network under a data-privacy constraint, termed AirPCA. The design features the exploitation of the waveform-superposition property of a multi-access channel to realize over-the-air aggregation of local subspace updates computed and simultaneously transmitted by devices to a server, thereby reducing the multi-access latency. The original drawback of this class of techniques, namely channel-noise perturbation to uncoded analog modulated signals, is turned into a mechanism for escaping from saddle points during stochastic gradient descent (SGD) in the AirPCA algorithm. As a result, the convergence of the AirPCA algorithm is accelerated. To materialize the idea, descent speeds in different types of descent regions are analyzed mathematically using martingale theory by accounting for wireless propagation and techniques including broadband transmission, over-the-air aggregation, channel fading and noise. The results reveal the accelerating effect of noise in saddle regions and the opposite effect in other types of regions. The insight and results are applied to designing an online scheme for adapting receive signal power to the type of current descent region. Specifically, the scheme amplifies the noise effect in saddle regions by reducing signal power and applies the power savings to suppressing the effect in other regions. From experiments using real datasets, such power control is found to accelerate convergence while achieving the same convergence accuracy as in the ideal case of centralized PCA.

Full Text