Semi-supervised learning effectively mitigates the lack of labeled data by introducing extensive unlabeled data. Despite achieving success in respiratory sound classification, in practice, it usually takes years to acquire a sufficiently sizeable unlabeled set, which consequently results in an extension of the research timeline. Considering that there are also respiratory sounds available in other related tasks, like breath phase detection and COVID-19 detection, it might be an alternative manner to treat these external samples as unlabeled data for respiratory sound classification. However, since these external samples are collected in different scenarios via different devices, there inevitably exists a distribution mismatch between the labeled and external unlabeled data. For existing methods, they usually assume that the labeled and unlabeled data follow the same data distribution. Therefore, they cannot benefit from external samples. To utilize external unlabeled data, we propose a semi-supervised method based on Joint Energy-based Model (JEM) in this paper. During training, the method attempts to use only the essential semantic components within the samples to model the data distribution. When non-semantic components like recording environments and devices vary, as these non-semantic components have a small impact on the model training, a relatively accurate distribution estimation is obtained. Therefore, the method exhibits insensitivity to the distribution mismatch, enabling the model to leverage external unlabeled data to mitigate the lack of labeled data. Taking ICBHI 2017 as the labeled set, HF_Lung_V1 and COVID-19 Sounds as the external unlabeled sets, the proposed method exceeds the baseline by 12.86.