The classical problem of supervised learning is to infer an accurate estimate of a target variable Y from a measured variable X using a set of labeled training samples. Motivated by the increasingly distributed nature of data and decision making, this paper considers a variation of this classical problem in which the inference is distributed between two nodes, e.g., a mobile device and a cloud, with a rate constraint on the communication between them. The mobile device observes X and sends a description M of X to the cloud, which computes an estimate Y̑ of Y. We follow the recent minimax learning approach to study this inference problem and show that it corresponds to a one-shot minimax noisy lossy source coding problem. We then establish information theoretic bounds on the risk-rate Lagrangian cost, leading to a general method for designing a near-optimal descriptor-estimator pair. A key ingredient in the proof of our result is a refined version of the strong functional representation lemma previously used to establish several one-shot source coding theorems. Our results show that a naive estimate-compress scheme for rate-constrained inference is not optimal in general. When the distribution of (X, Y) is known and the error is measured by the logarithmic loss, our bounds on the risk-rate Lagrangian cost provide a new one-shot operational interpretation of the information bottleneck. We also demonstrate a way to bound the excess risk of the descriptor-estimator pair obtained by our method.