Multi-object tracking (MOT) is a highly valued and challenging research topic in computer vision. To achieve more robust tracking performance, recently published MOT methods tend to use anchor-free object detectors, which have the advantage of dealing with the identity ambiguity problem encountered by anchor-based methods in learning appearance features. However, in practical applications, it is found that the detection accuracy of the anchor-free object detector based on classical convolutional neural networks in crowded scenes will be significantly reduced. In order to have better detection and tracking performance in crowded scenes, this paper proposes an anchor-free joint detection and embedding (JDE) MOT method based on Transformer architecture, called Swin-JDE. The proposed method includes a novel Patch-Expanding module, which can improve the spatial information of feature maps by up-sampling processing through neural network learning and Einops Notation-based rearrangement to enhance the detection and tracking performance of the MOT model. In terms of training method, we propose a two-step training method that trains the detection branch separately from the appearance branch to enhance the detection robustness of anchor-free predictors. Furthermore, during the training process, we also propose an examination method to remove occluded targets from the training dataset to improve the accuracy of the appearance embedding layer. In terms of data association, we propose a new post-processing method, which simultaneously considers the three factors of detection confidence, appearance embedding distance, intersection-over-union (IoU) distance to match each tracklet and the detection information to improve the tracking robustness of the MOT model. Experimental results show that the proposed method achieves 70.38% multiple object tracking accuracy (MOTA) and 69.53% identification F1-score (IDF1) results in the MOT20 benchmark dataset, and the identification switch (ID Switch) is reduced to 2026. Compared with FairMOT, the proposed method improves MOTA and IDF1 by 8.58% and 2.23%, respectively.