Deep support vector data description (Deep SVDD) combines deep mapping network and support vector data description (SVDD) to jointly optimize network connection weights and hypersphere volume. However, when the parameters of deep mapping network are set improperly, Deep SVDD may face the problem of hypersphere collapse, where all input data are mapped as the hypersphere center. To overcome the hypersphere collapse problem of Deep SVDD and improve the feature learning ability of deep mapping network, deep multi-sphere SVDD based on disentangled representation learning (DMSVDD-DRL) is proposed. DMSVDD-DRL consists of a variational autoencoder (VAE) and multiple hyperspheres. The feature representations obtained by VAE are disentangled into discriminative representations and generative representations that obey mixture t-distribution and Gaussian distribution, respectively. In the pre-training phase of DMSVDD-DRL, the network parameters and the hypersphere centers are initialized. In the training phase, the augmented data are added into the training set. The discriminative representations of both the input and augmented data are generated through the mapping network. Furthermore, multiple hyperspheres are constructed by the obtained discriminative representations in the feature space. Finally, the VAE loss of the input data, the reconstruction error of the augmented data, the augmentation loss between the input and augmented data, the average radius of the multiple hyperspheres, and the average distance from discriminative representations to their corresponding hypersphere centers are jointly minimized to obtain the optimal network connection weights and the multiple minimum volume hyperspheres. The effectiveness of the proposed DMSVDD-DRL is validated through the comparative and ablation experiments on the benchmark data sets. In addition, it is verified that DMSVDD-DRL is more robust against outliers in comparison with its related methods.