Abstract

Non-small cell lung carcinoma (NSCLC) accounts for 80–85% of all lung cancers, ranking first in the cause of death of malignant tumors. Early and accurate diagnosis of NSCLC is crucial for the follow-up treatment, which is helpful in improving the survival time and quality of life of patients. Machine learning methods based on gene expression profile data provide a new way for the early diagnosis of NSCLC. However, there are a few difficulties in practical application: data noise, overlapping gene grouping, and the importance evaluations for gene groups and individual genes. This paper aimed to deal with these problems by developing an adaptive group lasso regularized multinomial regression (AGLRMR). Robust principal component analysis (RPCA) was presented to separate clean data from noisy gene expression profile data of NSCLC. Weighted gene co-expression network analysis (WGCNA) was adopted to perform overlapping gene grouping. The importance evaluation criteria for individual genes and gene groups were proposed by fusing module membership, information theory, and symmetric uncertainty. Compared with RAMRSGL, AMRSOGL, MSGL, MRGL, and MR-lasso, AGLRMR improved the diagnosis accuracy of NSCLC by 1.2%, 2.3%, 2.2%, 2.5%, and 4.0% on NSCLC-1 dataset, and by 1.1%, 8.7%, 6.2%, 7.1%, and 16.3% on NSCLC-2 dataset, respectively. The operation speed of AGLRMR is consistent with AMRSOGL and slower than the other four methods. Biological analysis showed that the genes selected by AGLRMR were highly correlated with NSCLC.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call