Causal discovery, the inference of causal relations among variables from data, is a fundamental problem of science. Nowadays, due to an increased awareness of data privacy concerns, there has been a shift towards distributed data collection, processing and storage. To meet the pressing need for distributed causal discovery, we propose a novel federated DAG learning method called distributed annealing on regularized likelihood score (DARLS) to learn a causal graph from data stored on multiple clients. DARLS simulates an annealing process to search over the space of topological sorts, where the optimal graphical structure compatible with a sort is found by distributed optimization. This distributed optimization relies on multiple rounds of communication between local clients and a central server to estimate the graphical structure. We establish its convergence to the solution obtained by an oracle with access to all the data. To the best of our knowledge, DARLS is the first distributed method for learning causal graphs with such finite-sample oracle guarantees. To establish the consistency of DARLS, we also derive new identifiability results for causal graphs parameterized by generalized linear models, which could be of independent interest. Through extensive simulation studies and a real-world application, we show that DARLS outperforms existing federated learning methods and is comparable to oracle methods on pooled data, demonstrating its great advantages in estimating causal networks from distributed data.