Abstract

Gene expression variation is a quantitative trait that drives phenotypic diversity across populations. On a cellular level, gene expression is an intermediate phenotype between stored genetic information and the functional utilization of this information within the cell. Through Genome Wide Association Studies(GWAS), thousands of genetic polymorphisms associated with numerous diseases have been identified. These have provided many novel insights into the disrupted biological processes that drive the etiology of various health conditions. Expression Quantitative Trait Loci (eQTL) provide an additional layer of biological information about the physiological impact of common genetic variants. Therefore, the study of the genetic regulation of gene expression (eQTL studies) has been useful both in the validation and functional characterisation of GWAS polymorphisms. This has contributed to a better understanding of the precise molecular processes that contribute to the development of disease. Global transcriptomic analyses have provided as greater insight into the level of complexity that drives biological systems. Transcriptomic data are often comprised of gene regulatory and co-expression networks, an emergent property of transcriptomic and other nomic data. These networks within each nomics fields interact with each other to further add layers of complexity that drive biological systems. Variation contained with gene expression datasets can, therefore, provide detail into the flow of information through these biological systems and how these can be influenced by genetic polymorphisms. Transcriptomic variation is highly influenced by genetic and environmental factors. Genetic regulation of gene expression represents, with some exceptions, fixed regulatory points that strictly control the expression of genes. Variance attributed to environmental effects, on the other hand, are often biological responses to specific stimuli. The dissection of the genetic and environmental influences on expression levels will help to form a baseline upon which network models can be built to disseminate the biological flow of information in healthy, latent or disease groups. This thesis will detail both methodological methods to clean data, and statistical approaches analyze the complexities found within the variance of transcriptomic data. The focus of this thesis is the dissection of three major influences of gene expression variability: technical artifacts, environmental and genetic variation. Using statistical and quantitative genetic techniques on array-based genotype and gene expression datasets, this thesis examines: 1. The use of Principal Components Analysis (PCA) to identify and correct for known batch effects 2. Season variation as a pervasive environmental contributor to gene expression variation 3. The genetic contribution driving robust gene co-expression modules The Brisbane Systems Genetics Study (BSGS) is comprised of both unrelated and related individuals and was used throughout these three studies. The Center for Healthy Discovery and Well Being {CHDWB} cohort was used as a replication study and the Multiple Tissue Human Expression Resource (MuTHER) cohort was employed to examine tissue-specific effects. The first chapter provides a technical methodological analysis of the batch effect correction technique PCA. Batch effects have a large impact on gene expression variability, often creating artificial systematic trends. By decomposing the data in Principal Components (PCs) we were able to quantify the degree and distribution of technical artifacts within gene expression datasets and determine the effectiveness of this correction method. The second chapter examines the influence of pervasive macro-environmental factors on gene expression datasets and provides a statistical framework to identify seasonal variation. Since datasets are often collected over time, samples may contain seasonal trends in gene expression that are environmentally driven and are not regarded as technical artifacts. By using loess decomposition and cosinor regression, 74 transcripts with a significant season trend were identified independently of seasonal variation in blood cell count. Chapter three examines the genetic contribution to gene expression covariance between transcripts, called Blood Informative Transcripts (BITs) comprising of nine modules that have been previously identified and validated. Using quantitative genetic techniques, the genetic and environmental components driving phenotypic correlations for BIT transcripts were quantified. When compared to 10,000 bootstrap permutations of random probes the BITS demonstrate significant genetic correlation (average 0.63 across all BITs) and an average genetic contribution to phenotypic correlations of 0.42. The high degree of genetic correlation demonstrates a strong genetic framework regulating the expression of BITs modules. This chapter also examines the presence of this replicated modules in three separate tissue types, identifying several tissue-specific coexpression modules. Overall, this thesis explores the landscape of gene expression variability. A methodological framework for the identification of technical artifacts and seasonal effects is investigated and the genetic architecture driving transcriptomic co-expression is characterized.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call