Abstract We proposed a deep learning (DL) method to derive VS models from joint inversion of Rayleigh-wave dispersions and receiver functions, which is based on multilabel convolutional neural network and recurrent neural network. We used a spline-based approach to generate synthetic models instead of directly using existing models to build the training data set, which improves the generalization of the method. Unlike the traditional methods, which usually set a fixed VP/VS ratio, our method makes full use of the powerful data mining ability of DL to invert the VS models assuming different VP/VS ratios. A loss function is specially designed that focuses on key features of the model space, for example, the shape and depth of Moho. Synthetic tests demonstrate that the proposed method is accurate and fast. Application to the southeast margin of the Tibetan Plateau shows results consistent with the previous joint inversion with P constraints, indicating the proposed method is reliable and robust.