Communication overhead is a main bottleneck in federated learning (FL) especially in the wireless environment due to the limited data rate and unstable radio channels. The communication challenge necessitates holistic selection of participating clients that accounts for both the computation needs and communication cost, as well as judicious allocation of the limited transmission resource. Meanwhile, the random unpredictable nature of both the training data samples and the communication channels requires an online optimization approach that adapts to the changing system state over time. In this work, we consider a general framework of online joint client sampling and power allocation for wireless FL under time-varying communication channels. We formulate it as a stochastic network optimization problem that admits a Lyapunov-typed solution approach. This leads to per-training-round subproblems with a special bi-convex structure, which we leverage to propose globally optimal solutions, culminating in a meta algorithm that provides strong performance guarantees. We further study three specific FL problems covering multiple scenarios, namely with IID or non-IID data, whether robustness against data drift is required, and with unbiased or biased client sampling. We derive detailed algorithms for each of these problems. Simulation with standard classification tasks demonstrate that the proposed communication-aware algorithms outperform their counterparts under a wide range of learning and communication scenarios.