Abstract

Abstract Background: KRAS proto-oncogene is one of the most commonly mutated genes in Non-Small Cell Lung Cancer (NSCLC) with greater frequency in the adenocarcinoma (AD) histotype. Patients with mutant KRAS often do not benefit from standard therapy and effective targeted therapy are still not available for these patients. Little is known about which pathways are activated in KRAS mutant ADs. Understanding which interactions drive KRAS mutant lesions can lead to the identification of novel targets for treating more effectively patients harboring a KRAS mutation. The aim of this study was to apply machine learning techniques to Reverse Phase Protein Microarray (RPPA) data to select proteins whose activity can better describe the KRAS status (wild type or mutated). Materials and methods: A total of 58 samples (24 KRAS wild type and 34 KRAS mutated) were collected from surgically treated AD patients of the lung at the H. Lee Moffitt Cancer Center & Research Institute (Tampa, FL) and at the S. Maria della Misericordia Hospital (Perugia, Italy). Tumor cells were isolated with laser capture microdissection and RPPA was performed to quantitatively measure the expression/activation levels of 155 proteins. Recursive Feature Elimination with Support Vector Machine (RFE-SVM) was used to rank proteins according to the absolute value of their weight in the hyperplane defined by the SVM to separate the 2 groups (KRAS wild type or mutated). LSimpute algorithm was used to impute missing data due to depletion of biological sample. Stability and robustness of the results were achieved using the RFE-SVM algorithm within an ensemble feature selection framework. Results: The LSimpute algorithm was applied to impute missing data in 11 patients that presented a number of missing proteins between 1% and 48% in the single record and of 5% of the overall dataset. The tested algorithm accuracy was 0.90. The RFE-SVM algorithm was then applied to the entire dataset (58 samples). The analysis of the RPPA data revealed that the activation of many signaling proteins involved in the ERK pathway is also discriminative relatively to KRAS WT/MUT. Among the proteins with higher rank were found p70S6K, ERK1/2T202/Y204, EGFR, PP2A and Akt S473. Stability and robustness of the output of the algorithm was confirmed in the completed dataset RFE-SVM algorithm. Conclusion: The proposed methodology is the first example of computational approach based on machine learning algorithms applied to the analysis of proteomic data in cancer translational research. The output of the procedure is a ranking of proteins that could play potential key roles in the signal pathways of patients harboring KRAS mutations when compared to KRAS wild type patients. Results obtained from this study could make important contributions to the identification of proteins that can be targeted to develop more effective treatments for AD patients with KRAS mutations. Furthermore this methodology can overcome the issue of missing values in RPPA datasets generating a stable and robust complete output. Citation Format: Fortunato Bianconi, Elisa Baldelli, Federico Patiti, Paolo Valigi, Eric B. Haura, Lucio Crinò, Vienna Ludovini, Emanuel Petricoin, Mariaelena Pierobon. A machine learning approach applied to Reverse Phase Protein Microarray data for pathways activation mapping of KRAS wild type and mutated adenocarcinomas of the lung. [abstract]. In: Proceedings of the AACR Special Conference on Computational and Systems Biology of Cancer; Feb 8-11 2015; San Francisco, CA. Philadelphia (PA): AACR; Cancer Res 2015;75(22 Suppl 2):Abstract nr B1-14.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call