The benefit and burden of contemporary techniques for the molecular characterization of samples is the vast amount of data generated. In the era of “big data”, it has become imperative that we develop multi-disciplinary teams combining scientists, clinicians, and data analysts. In this review, we discuss a number of approaches developed by our University of Texas MD Anderson Lung Cancer Multidisciplinary Program to process and utilize such large datasets with the goal of identifying rational therapeutic options for biomarker-driven patient subsets. Large integrated datasets such as the The Cancer Genome Atlas (TCGA) for patient samples and the Cancer Cell Line Encyclopedia (CCLE) for tumor derived cell lines include genomic, transcriptomic, methylation, miRNA, and proteomic profiling alongside clinical data. To best use these datasets to address urgent questions such as whether we can define molecular subtypes of disease with specific therapeutic vulnerabilities, to quantify states such as epithelial-to-mesenchymal transition that are associated with resistance to treatment, or to identify potential therapeutic agents in models of cancer that are resistant to standard treatments required the development of tools for systematic, unbiased high-throughput analysis. Together, such tools, used in a multi-disciplinary environment, can be leveraged to identify novel treatments for molecularly defined subsets of cancer patients, which can be easily and rapidly translated from benchtop to bedside.