Abstract
Aneuploidy, defined as abnormal chromosome number or somatic DNA copy number, is a characteristic of many aggressive tumors and is thought to drive tumorigenesis. Gene expression-aneuploidy association studies have previously been conducted to explore cellular mechanisms associated with aneuploidy. However, in an observational setting, gene expression is influenced by many factors that can act as confounders between gene expression and aneuploidy, leading to spurious correlations between the two variables. These factors include known confounders such as sample purity or batch effect, as well as gene co-regulation which induces correlations between the expression of causal genes and non-causal genes. We use a linear mixed-effects model (LMM) to account for confounding effects of tumor purity and gene co-regulation on gene expression-aneuploidy associations. When applied to patient tumor data across diverse tumor types, we observe that the LMM both accounts for the impact of purity on aneuploidy measurements and identifies a new association between histone gene expression and aneuploidy.
Highlights
Aneuploidy, defined as abnormal chromosome number or somatic DNA copy number, is a characteristic of many aggressive tumors and is thought to drive tumorigenesis
We demonstrate that the linear mixed-effects model (LMM) can correct for confounding due to purity in a gene expression-aneuploidy association study and identifies a new association between histone gene expression and aneuploidy
The LMM potentially corrects for other unknown confounders such as batch effect, which are known to influence the expression of many genes[30,31,32]
Summary
Aneuploidy, defined as abnormal chromosome number or somatic DNA copy number, is a characteristic of many aggressive tumors and is thought to drive tumorigenesis. Large amounts of tumor sequencing data from projects such as the The Cancer Genome Atlas (TCGA) have allowed researchers to conduct expression-based studies that identify genes whose expression is significantly associated with aneuploidy[11,12,13], with the premise that genes whose expression levels are most strongly correlated with aneuploidy across patient tumor samples are most likely to have a mechanistic relationship with aneuploidy These studies use a simple linear regression (SLR) model with gene expression level as the predictor variable and an aneuploidy metric calculated from DNA copy number data as the response variable to perform association testing for each gene individually.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.