The discrimination of seafloor substrate type is an extremely important part of seafloor science, and the substrate information is of great significance for the development of marine science and the protection of the marine environment. Current sonar equipment can efficiently generate seafloor images and present seafloor information visually, so the seafloor substrate classification technology based on sonar images has become a hot research topic. Convolutional neural network, as one of the most important classification algorithms in seabed substrate sonar image classification, has excellent performance in most cases. However, the size of the convolutional kernel of convolutional neural network limits the global feature extraction ability, and the ability to discriminate global features in sonar images is weak. In addition, seabed substrate sonar images have labelled data acquisition difficulty and high cost, and acoustic seabed substrate classification in practice generally belongs to small sample classification scenarios. Aiming at the above problems, this thesis selects Swin Transformer, which has strong global feature extraction ability, as the classifier, and uses MoCo self-supervised learning to pre-train the unlabeled data in order to achieve better results.