Abstract

Failure to adequately characterize cell lines, and understand the differences between in vitro and in vivo biology, can have serious consequences on the translatability of in vitro scientific studies to human clinical trials. This project focuses on the Michigan Cancer Foundation-7 (MCF-7) cells, a human breast adenocarcinoma cell line that is commonly used for in vitro cancer research, with over 42,000 publications in PubMed. In this study, we explore the key similarities and differences in gene expression networks of MCF-7 cell lines compared to human breast cancer tissues. We used two MCF-7 data sets, one data set collected by ARCHS4 including 1032 samples and one data set from Gene Expression Omnibus GSE50705 with 88 estradiol-treated MCF-7 samples. The human breast invasive ductal carcinoma (BRCA) data set came from The Cancer Genome Atlas, including 1212 breast tissue samples. Weighted Gene Correlation Network Analysis (WGCNA) and functional annotations of the data showed that MCF-7 cells and human breast tissues have only minimal similarity in biological processes, although some fundamental functions, such as cell cycle, are conserved. Scaled connectivity—a network topology metric—also showed drastic differences in the behavior of genes between MCF-7 and BRCA data sets. Finally, we used canSAR to compute ligand-based druggability scores of genes in the data sets, and our results suggested that using MCF-7 to study breast cancer may lead to missing important gene targets. Our comparison of the networks of MCF-7 and human breast cancer highlights the nuances of using MCF-7 to study human breast cancer and can contribute to better experimental design and result interpretation of study involving this cell line.

Highlights

  • Cell lines have been extensively used as models for human biology and have contributed to many insights: from the development of vaccines and toxicology screening, to the study of disease mechanisms and treatments

  • We selected three data sets based on human breast cancer tissues: 1) the TCGA data set of invasive breast cancer biopsies ( breast invasive ductal carcinoma (BRCA)), which has the advantage of reflecting human in vivo samples, biopsies by their nature include a mix of different tissues 2) the ARCHS4 collection of Michigan Cancer Foundation-7 (MCF-7) samples, which is an attempt to massively mine publicly available RNA-seq experiments, and consists of 1032 samples combined from Gene Expression Omnibus (GEO), and 3) a smaller study of MCF-7 cells exposed to estrogen in a dose response curve

  • As the data sets involve a range of different technologies, preprocessing strategies, and in the case of ARCHS4, potentially many different biological conditions, we began with the basic initial step of reducing the gene expression set to the top 10,000 most variant genes, to eliminate genes that were minimally or inconsistently expressed and would confound the use of a correlation-based approach

Read more

Summary

Introduction

Cell lines have been extensively used as models for human biology and have contributed to many insights: from the development of vaccines and toxicology screening, to the study of disease mechanisms and treatments. Despite these achievements, there have been growing concerns about the quality of cell lines (Hartung 2007), ranging from cell-line misidentification, unreproducible studies, to failed clinical trials (Schweppe et al, 2008; Gillet et al, 2013; Hartung, 2013). Not all cancer cell lines have the same value as models to study cancer in humans (Gillet et al, 2013)

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call