Abstract
e21016 Background: Advances in high throughput measurement technologies (-omics data) have made it possible to generate high complexity, high volume data for oncology research. Researchers are often confronted many more measurements than samples (p > > > n), which poses challenges for both modeling the complexity of disease at the molecular mechanism level, and overfitting when generating predictive models with complex data. Here, we applied a prior knowledge-driven approach to characterize and classify heavy versus light smokers with lung cancer from The Cancer Genome Atlas, an open source repository that catalogs, harmonizes and hosts -omics data collected from samples generously donated from cancer patients. Methods: We applied a reverse inferencing approach to systematically interrogate RNAseq measurements from tumor and control biopsies against a knowledgebase of directed gene networks curated from published experiments. If patterns observed in the data are significantly similar to those in a network, an inference about the directional activity of that network can be made; e.g., the increased transcriptional activity of NFKB. Our library was nucleated through an open sourced knowledge graph and enhanced with updated and relevant knowledge using the open sourced Biological Expression Language framework. Directed networks were either qualitatively scored and used to build disease models, or semi-quantitatively scored and used as classification features. Results: In LUAD tumors, we detected a pattern of gene signatures which indicated a tumor stem cell-like phenotype characterized by predicted decreases in the activity of pro-differentiation factors and an increased response to hypoxia. Analysis of patients with heavy ( > 40) versus light ( < 10) pack-year burden suggested an augmented dedifferentiation profile in heavy smokers. In this example, improved classification was observed through features compression through directed network scoring compared to using individual RNA measurements selected by filtration methods. Conclusions: In-silico analysis of lung cancer patient biopsies generated hypotheses implicating stem cell signaling in tumors, and a further stratification of this signal based on patient pack year burden. Mechanistic modeling may be a useful application to the overfitting problem often encountered with -omics data in translational studies. Data from other TCGA indications can be used to evaluate the consistency of this type of approach
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.