Predictive models for assessing the risk of developing lung cancers can help identify high-risk individuals with the aim of recommending further screening and early intervention. To facilitate pre-hospital self-assessments, some studies have exploited predictive models trained on non-clinical data (e.g., smoking status and family history). The performance of these models is limited due to not considering clinical data (e.g., blood test and medical imaging results). Deep learning has shown the potential in processing complex data that combine both clinical and non-clinical information. However, predicting lung cancers remains difficult due to the severe lack of positive samples among follow-ups. To tackle this problem, this paper presents a generative-discriminative framework for improving the ability of deep learning models to generalize. According to the proposed framework, two nonlinear generative models, one based on the generative adversarial network and another on the variational autoencoder, are used to synthesize auxiliary positive samples for the training set. Then, several discriminative models, including a deep neural network (DNN), are used to assess the lung cancer risk based on a comprehensive list of risk factors. The framework was evaluated on over 55 000 subjects questioned between January 2014 and December 2017, with 699 subjects being clinically diagnosed with lung cancer between January 2014 and August 2019. According to the results, the best performing predictive model built using the proposed framework was based on DNN. It achieved an average sensitivity of 76.54% and an area under the curve of 69.24% in distinguishing between the cases of lung cancer and normal cases on test sets.
Read full abstract