Abstract

BackgroundFew studies have investigated prognostic biomarkers of distant metastases of lung cancer. One of the central difficulties in identifying biomarkers from microarray data is the availability of only a small number of samples, which results overtraining. Recently obtained evidence reveals that epithelial–mesenchymal transition (EMT) of tumor cells causes metastasis, which is detrimental to patients’ survival.ResultsThis work proposes a novel optimization approach to discovering EMT-related prognostic biomarkers to predict the distant metastasis of lung cancer using both microarray and survival data. This weighted objective function maximizes both the accuracy of prediction of distant metastasis and the area between the disease-free survival curves of the non-distant and distant metastases. Seventy-eight patients with lung cancer and a follow-up time of 120 months are used to identify a set of gene markers and an independent cohort of 26 patients is used to evaluate the identified biomarkers. The medical records of the 78 patients show a significant difference between the disease-free survival times of the 37 non-distant- and the 41 distant-metastasis patients. The experimental results thus obtained are as follows. 1) The use of disease-free survival curves can compensate for the shortcoming of insufficient samples and greatly increase the test accuracy by 11.10%; and 2) the support vector machine with a set of 17 transcripts, such as CCL16 and CDKN2AIP, can yield a leave-one-out cross-validation accuracy of 93.59%, a test accuracy of 76.92%, a large disease-free survival area of 74.81%, and a mean survival prediction error of 3.99 months. The identified putative biomarkers are examined using related studies and signaling pathways to reveal the potential effectiveness of the biomarkers in prospective confirmatory studies.ConclusionsThe proposed new optimization approach to identifying prognostic biomarkers by combining multiple sources of data (microarray and survival) can facilitate the accurate selection of biomarkers that are most relevant to the disease while solving the problem of insufficient samples.Electronic supplementary materialThe online version of this article (doi:10.1186/s12859-015-0463-x) contains supplementary material, which is available to authorized users.

Highlights

  • Few studies have investigated prognostic biomarkers of distant metastases of lung cancer

  • The proposed new optimization approach to identifying prognostic biomarkers by combining multiple sources of data can facilitate the accurate selection of biomarkers that are most relevant to the disease while solving the problem of insufficient samples

  • Performance evaluation with various weights To determine the best value of the weight w and prevent overtraining in the subsequent design of gene selection methods, all the 78 samples are randomly divided into five groups, of which four are used as a training set and the other serves as an independent test set

Read more

Summary

Introduction

Few studies have investigated prognostic biomarkers of distant metastases of lung cancer. One of the central difficulties in identifying biomarkers from microarray data is the availability of only a small number of samples, which results overtraining. Obtained evidence reveals that epithelial–mesenchymal transition (EMT) of tumor cells causes metastasis, which is detrimental to patients’ survival. Some tumor cells acquire new characteristics, such as over-expression of epithelialmesenchymal transition (EMT) markers, and undergo profound morphogenetic changes. Growth factors [6,7], liganddependent nuclear receptors [8], transcription regulators [3,9], cytokine [7,10], and kinase [11,12], which are potential regulators that are related to EMT have been identified in the literature. Signaling pathways that are activated by intrinsic or extrinsic stimulation converge on the transcriptional factors and regulate phenotypic changes of cancer cells [9]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call