ABSTRACTThe total masses of galaxy clusters characterize many aspects of astrophysics and the underlying cosmology. It is crucial to obtain reliable and accurate mass estimates for numerous galaxy clusters over a wide range of redshifts and mass scales. We present a transfer-learning approach to estimate cluster masses using the ugriz-band images in the SDSS Data Release 12. The target masses are derived from X-ray or SZ measurements that are only available for a small subset of the clusters. We designed a semisupervised deep learning model consisting of two convolutional neural networks. In the first network, a feature extractor is trained to classify the SDSS photometric bands. The second network takes the previously trained features as inputs to estimate their total masses. The training and testing processes in this work depend purely on real observational data. Our algorithm reaches a mean absolute error (MAE) of 0.232 dex on average and 0.214 dex for the best fold. The performance is comparable to that given by redMaPPer, 0.192 dex. We have further applied a joint integrated gradient and class activation mapping method to interpret such a two-step neural network. The performance of our algorithm is likely to improve as the size of training data set increases. This proof-of-concept experiment demonstrates the potential of deep learning in maximizing the scientific return of the current and future large cluster surveys.