This study aimed to establish a multimodal deep-learning network model to enhance the diagnosis of benign and malignant pulmonary ground glass nodules [GGNs]. Retrospective data on pulmonary GGNs were collected from multiple centers across China, including North, Northeast, Northwest, South, and Southwest China. The data were divided into a training set and a validation set in an 8:2 ratio. In addition, a GGN dataset was also obtained from our hospital database and used as the test set. All patients underwent chest computed tomography [CT], and the final diagnosis of the nodules was based on postoperative pathological reports. The Residual Network [ResNet] was used to extract imaging data, the Word2Vec method for semantic information extraction, and the Self Attention method for combining imaging features and patient data to construct a multimodal classification model. Then, the diagnostic efficiency of the proposed multimodal model was compared with that of existing ResNet and VGG models and radiologists. The multicenter dataset comprised 1020 GGNs, including 265 benign and 755 malignant nodules, and the test dataset comprised 204 GGNs, with 67 benign and 137 malignant nodules. In the validation set, the proposed multimodal model achieved an accuracy of 90.2%, a sensitivity of 96.6%, and a specificity of 75.0%, which surpassed that of the VGG [73.1%, 76.7%, and 66.5%] and ResNet [78.0%, 83.3%, and 65.8%] models in diagnosing benign and malignant nodules. In the test set, the multimodal model accurately diagnosed 125 [91.18%] malignant nodules, outperforming radiologists [80.37% accuracy]. Moreover, the multimodal model correctly identified 54 [accuracy, 80.70%] benign nodules, compared to radiologists' accuracy of 85.47%. The consistency test comparing radiologists' diagnostic results with the multimodal model's results in relation to postoperative pathology showed strong agreement, with the multimodal model demonstrating closer alignment with gold standard pathological findings [Kappa=0.720, P<0.01]. The multimodal deep learning network model exhibited promising diagnostic effectiveness in distinguishing benign and malignant GGNs and, therefore, holds potential as a reference tool to assist radiologists in improving the diagnostic accuracy of GGNs, potentially enhancing their work efficiency in clinical settings.