Federated learning (FL) is a decentralized machine learning architecture, which leverages a large number of remote devices to learn a joint model with distributed training data. However, the system-heterogeneity is one major challenge in an FL network to achieve robust distributed learning performance, which comes from two aspects: 1) device-heterogeneity due to the diverse computational capacity among devices and 2) data-heterogeneity due to the nonidentically distributed data across the network. Prior studies addressing the heterogeneous FL issue, for example, FedProx, lack formalization and it remains an open problem. This work first formalizes the system-heterogeneous FL problem and proposes a new algorithm, called federated local gradient approximation (FedLGA), to address this problem by bridging the divergence of local model updates via gradient approximation. To achieve this, FedLGA provides an alternated Hessian estimation method, which only requires extra linear complexity on the aggregator. Theoretically, we show that with a device-heterogeneous ratio ρ , FedLGA achieves convergence rates on non-i.i.d. distributed FL training data for the nonconvex optimization problems with O ([(1+ρ)/√{ENT}] + 1/T) and O ([(1+ρ)√E/√{TK}] + 1/T) for full and partial device participation, respectively, where E is the number of local learning epoch, T is the number of total communication round, N is the total device number, and K is the number of the selected device in one communication round under partially participation scheme. The results of comprehensive experiments on multiple datasets indicate that FedLGA can effectively address the system-heterogeneous problem and outperform current FL methods. Specifically, the performance against the CIFAR-10 dataset shows that, compared with FedAvg, FedLGA improves the model's best testing accuracy from 60.91% to 64.44%.