Abstract We present a deep machine-learning (ML) approach to constraining cosmological parameters with multiwavelength observations of galaxy clusters. The ML approach has two components: an encoder that builds a compressed representation of each galaxy cluster and a flexible convolutional neural networks to estimate the cosmological model from a cluster sample. It is trained and tested on simulated cluster catalogs built from the Magneticum simulations. From the simulated catalogs, the ML method estimates the amplitude of matter fluctuations, σ 8, at approximately the expected theoretical limit. More importantly, the deep ML approach can be interpreted. We lay out three schemes for interpreting the ML technique: a leave-one-out method for assessing cluster importance, an average saliency for evaluating feature importance, and correlations in the terse layer for understanding whether an ML technique can be safely applied to observational data. These interpretation schemes led to the discovery of a previously unknown self-calibration mode for flux- and volume-limited cluster surveys. We describe this new mode, which uses the amplitude and peak of the cluster mass probability density function as anchors for mass calibration. We introduce the term overspecialized to describe a common pitfall in astronomical applications of ML in which the ML method learns simulation-specific details, and we show how a carefully constructed architecture can be used to check for this source of systematic error.