ObjectivePatients with indeterminate pulmonary nodules (IPN) with an intermediate to a high probability of lung cancer generally undergo invasive diagnostic procedures. Chest computed tomography image and clinical data have been in estimating the pretest probability of lung cancer. In this study, we apply a deep learning network to integrate multi-modal data from CT images and clinical data (including blood-based biomarkers) to improve lung cancer diagnosis. Our goal is to reduce uncertainty and to avoid morbidity, mortality, over- and undertreatment of patients with IPNs. MethodWe use a retrospective study design with cross-validation and external-validation from four different sites. We introduce a deep learning framework with a two-path structure to learn from CT images and clinical data. The proposed model can learn and predict with single modality if the multi-modal data is not complete. We use 1284 patients in the learning cohort for model development. Three external sites (with 155, 136 and 96 patients, respectively) provided patient data for external validation. We compare our model to widely applied clinical prediction models (Mayo and Brock models) and image-only methods (e.g., Liao et al. model). ResultsOur co-learning model improves upon the performance of clinical-factor-only (Mayo and Brock models) and image-only (Liao et al.) models in both cross-validation of learning cohort (e.g., AUC: 0.787 (ours) vs. 0.707–0.719 (baselines), results reported in validation fold and external-validation using three datasets from University of Pittsburgh Medical Center (e.g., 0.918 (ours) vs. 0.828–0.886 (baselines)), Detection of Early Cancer Among Military Personnel (e.g., 0.712 (ours) vs. 0.576–0.709 (baselines)), and University of Colorado Denver (e.g., 0.847 (ours) vs. 0.679–0.746 (baselines)). In addition, our model achieves better re-classification performance (cNRI 0.04 to 0.20) in all cross- and external-validation sets compared to the Mayo model. ConclusionsLung cancer risk estimation in patients with IPNs can benefit from the co-learning of CT image and clinical data. Learning from more subjects, even though those only have a single modality, can improve the prediction accuracy. An integrated deep learning model can achieve reasonable discrimination and re-classification performance.