Abstract

Abstract Ovarian cancer has the poorest prognosis of the gynecological cancers. 75% of the patients are diagnosed with stage III-IV on their diagnosis. The 5 year survival rate of ovarian cancer patients whose disease was diagnosed at stage III or IV is less than 20%, but that of the patients diagnosed with stage I is higher than 90%. The early detection and diagnosis of cancer is one of the major application areas in omics technology and has received tremendous attention in last few years. The main focus of this study is aimed at the investigation of robust formulations for the feature selection algorithms which are utilized during the statistical analysis of omics profiles in diagnosing ovarian cancer. The common difficulty in the feature selection for cancer-specific biomarker discovery is the inconsistency of the feature sets selected among different samples or by different selection methods. This issue is particularly crucial in biological applications where subsequent biomedical analysis of selected features requires relatively considerable time and costs. In this study, we analyzed the reproducibility of different feature selection algorithms usually employed in chemometrics and investigated their stability to provide valuable insight for designing proper feature selection algorithms in the domain of cancer-specific biomarker discovery. Two data sets of MALDI-TOF-MS (proteomics) and LC/TOF-MS (metabolomics) spectra measured from serum samples to diagnose ovarian cancer were analyzed. To evaluate the feature selection methods in terms of their consistency, we systematically quantified a measure of similarity (similarity≤1) for the outcome of each method by running each selection procedures repetitively. The results show that, in general, the multivariate feature selection methods outperformed the univariate methods, in that they selected more consistent features regardless of training set perturbations. The results clearly show that some feature selection methods can suffer severely from the irreproducibility issue and this might deflate the potential benefits of omics technology for cancer diagnosis. A possible remedy might be multi-objective formulation of the feature selection problem by simultaneously optimizing stability and accuracy of classification models, and this will be investigated further in future research. Citation Format: {Authors}. {Abstract title} [abstract]. In: Proceedings of the 102nd Annual Meeting of the American Association for Cancer Research; 2011 Apr 2-6; Orlando, FL. Philadelphia (PA): AACR; Cancer Res 2011;71(8 Suppl):Abstract nr 5118. doi:10.1158/1538-7445.AM2011-5118

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call