Extensive simulations assess the performance of genome-wide association mapping in various Saccharomyces cerevisiae subpopulations

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

With the advent of high throughput sequencing technologies, genome-wide association studies (GWAS) have become a powerful paradigm for dissecting the genetic origins of the observed phenotypic variation. We recently completely sequenced the genome of 1011 Saccharomyces cerevisiae isolates, laying a strong foundation for GWAS. To assess the feasibility and the limits of this approach, we performed extensive simulations using five selected subpopulations as well as the total set of 1011 genomes. We measured the ability to detect the causal genetic variants involved in Mendelian and more complex traits using a linear mixed model approach. The results showed that population structure is well accounted for and is not the main problem when the sample size is high enough. While the genetic determinant of a Mendelian trait is easily mapped in all studied subpopulations, discrepancies are seen between datasets when performing GWAS on a complex trait in terms of detection, false positive and false negative rate. Finally, we performed GWAS on the different defined subpopulations using a real quantitative trait (resistance to copper sulfate) and showed the feasibility of this approach. The performance of each dataset depends simultaneously on several factors such as sample size, relatedness and population evolutionary history.This article is part of the theme issue ‘Genetic basis of adaptation and speciation: from loci to causative mutations’.

Similar Papers
  • Research Article
  • Cite Count Icon 83
  • 10.1161/circgen.118.002090
Human Genetics of Obesity and Type 2 Diabetes Mellitus: Past, Present, and Future.
  • Jun 1, 2018
  • Circulation: Genomic and Precision Medicine
  • Erik Ingelsson + 1 more

Type 2 diabetes mellitus (T2D) and obesity already represent 2 of the most prominent risk factors for cardiovascular disease, and are destined to increase in importance given the global changes in lifestyle. Ten years have passed since the first round of genome-wide association studies for T2D and obesity. During this decade, we have witnessed remarkable developments in human genetics. We have graduated from the despair of candidate gene-based studies that generated few consistently replicated genotype-phenotype associations, to the excitement of an exponential harvest of loci robustly associated with medical outcomes through ever larger genome-wide association study meta-analyses. As well as discovering hundreds of loci, genome-wide association studies have provided transformative insights into the genetic architecture of T2D and other complex traits, highlighting the extent of polygenicity and the tiny effect sizes of many common risk alleles. Genome-wide association studies have also provided a critical starting point for discovering new biology relevant to these traits. Expectations are high that these discoveries will foster development of more effective strategies for intervention, through optimization of precision medicine approaches. In this article, we review current knowledge and provide suggestions for the next steps in genetic research for T2D and obesity. We focus on four areas relevant to precision medicine: genetic architecture, pharmacogenetics and other gene-environment interactions, mechanistic inference, and drug development. As we describe, the genetic architecture of complex traits has major implications for the prospects of precision medicine, rendering some anticipated approaches decidedly unrealistic. We highlight obstacles to the translation of human genetic findings into mechanism inference but are optimistic that, as these are overcome, there is untapped potential for novel drugs and more effective strategies for treating and preventing T2D and obesity.

  • Research Article
  • Cite Count Icon 43
  • 10.1038/s41598-018-37216-z
Genome wide association study of body weight and feed efficiency traits in a commercial broiler chicken population, a re-visitation
  • Jan 29, 2019
  • Scientific Reports
  • Wossenie Mebratie + 4 more

Genome wide association study was conducted using a mixed linear model (MLM) approach that accounted for family structure to identify single nucleotide polymorphisms (SNPs) and candidate genes associated with body weight (BW) and feed efficiency (FE) traits in a broiler chicken population. The results of the MLM approach were compared with the results of a general linear model approach that does not take family structure in to account. In total, 11 quantitative trait loci (QTL) and 21 SNPs, were identified to be significantly associated with BW traits and 5 QTL and 5 SNPs were found associated with FE traits using MLM approach. Besides some overlaps between the results of the two GWAS approaches, there are considerable differences in the detected QTL. Even though the genomic inflation factor (λ) values indicate that there is no strong family structure in this population, using models that account for the existing family structure may reduce bias and increase accuracy of the estimated SNP effects in the association analysis. The SNPs and candidate genes identified in this study provide information on the genetic background of BW and FE traits in broiler chickens and might be used as prior information for genomic selection.

  • Research Article
  • Cite Count Icon 103
  • 10.1016/j.ajhg.2011.04.023
DASH: A Method for Identical-by-Descent Haplotype Mapping Uncovers Association with Recent Variation
  • May 27, 2011
  • The American Journal of Human Genetics
  • Alexander Gusev + 9 more

DASH: A Method for Identical-by-Descent Haplotype Mapping Uncovers Association with Recent Variation

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 5
  • 10.1007/s10722-022-01352-3
Association mapping, trait variation, interaction and population structure analysis in cucumber (Cucumis sativus L.)
  • Feb 5, 2022
  • Genetic Resources and Crop Evolution
  • Rahul Kumar + 7 more

In several regions of the world, low productivity in this crop is attributed to several factors including poor understanding of the genomic complexity of important traits associated with fruit quality and yield. Therefore, genome wide association analysis was performed for important traits using simple sequence repeats (SSR) markers. Significant variation was recorded for all the studied traits in 78 cucumber genotypes under two environments (open field and net house) which indicated that the constituted association panel was suitable for association mapping. Genotyping was done using 60 highly polymorphic SSRs. By performing genome scanning out of 60 SSR markers, using mixed linear model (MLM) approach 4 and 6 markers explained an average of 23.93% and 17.37% of the trait variation under net house and open field condition, respectively. Based on MLM approach two markers on 3rd chromosome (UW084942) and 4th chromosome (UW062953) found to be associated with the average fruit weight (g) under both net house and open field condition. Population structure analysis revealed four distinct sub-populations that corroborated with the geographical origin as well as fruit quality and quantitative traits. The four sub-populations (A–D) had fixation index percentage equal to 24.35 29.48, 37.17 and 8.97 respectively, supporting the existence of moderate population structures. Therefore, the extensive phenotypic and genotypic characterization, population structure, and markers associated with important traits provided in this study will facilitate marker assisted improvement programs in cucumber.

  • Research Article
  • Cite Count Icon 15
  • 10.1038/s41598-021-90774-7
Robustification of GWAS to explore effective SNPs addressing the challenges of hidden population stratification and polygenic effects
  • Jun 22, 2021
  • Scientific Reports
  • Zobaer Akond + 3 more

Genome-wide association studies (GWAS) play a vital role in identifying important genes those is associated with the phenotypic variations of living organisms. There are several statistical methods for GWAS including the linear mixed model (LMM) which is popular for addressing the challenges of hidden population stratification and polygenic effects. However, most of these methods including LMM are sensitive to phenotypic outliers that may lead the misleading results. To overcome this problem, in this paper, we proposed a way to robustify the LMM approach for reducing the influence of outlying observations using the β-divergence method. The performance of the proposed method was investigated using both synthetic and real data analysis. Simulation results showed that the proposed method performs better than both linear regression model (LRM) and LMM approaches in terms of powers and false discovery rates in presence of phenotypic outliers. On the other hand, the proposed method performed almost similar to LMM approach but much better than LRM approach in absence of outliers. In the case of real data analysis, our proposed method identified 11 SNPs that are significantly associated with the rice flowering time. Among the identified candidate SNPs, some were involved in seed development and flowering time pathways, and some were connected with flower and other developmental processes. These identified candidate SNPs could assist rice breeding programs effectively. Thus, our findings highlighted the importance of robust GWAS in identifying candidate genes.

  • PDF Download Icon
  • Research Article
  • 10.1155/2015/564273
Statistical Analysis of High-Dimensional Genetic Data in Complex Traits
  • Jan 1, 2015
  • BioMed Research International
  • Taesung Park + 3 more

With the recent development of high-throughput DNA microarray and next-generation sequencing techniques for detecting various genomic variants (SNVs, CNVs, INDELs, etc.), genome-wide association studies (GWASs) have become a popular strategy to discover genetic factors affecting common complex diseases. Many GWASs have successfully identified genetic risk factors associated with common diseases and have achieved substantial success in unveiling genomic regions responsible for the various aspects of phenotypes. However, identifying the underlying mechanism of disease susceptible loci has proven to be difficult due to the complex genetic architecture of common diseases. The previously associated variants through GWASs only explain a small portion of the genetic factors in complex diseases. This rather limited finding is partly ascribed to the lack of intensive analysis on undiscovered genetic determinants such as rare variants and gene-gene interactions. Unfortunately, standard methods used to test for association with single common genetic variants are underpowered for detection of rare variants and genetic interactions. This special issue is dedicated to presenting state-of-the-art statistical and computational methods for finding missing heritability underlying complex traits with massive genetic data including GWAS, next-generation sequencing, and DNA microarray data. The main focus of this special issue is on data mining and machine learning for advanced GWAS analysis. The advanced GWAS analysis includes multi-SNP analysis, gene-gene and gene-environment interaction analysis, estimation of missing heritability, and analysis of population heterogeneity. This special issue provides a platform to the researchers with expertise in data mining to discuss recent advancements in analytic approach of post-GWAS association analysis in field of statistics and bioinformatics. The paper by W. Lee et al. proposes an approach to identifying clinically interesting subgroups in a heterogeneous population. The identification step uses a clustering algorithm and proposes an improved false discovery rate- (FDR-) based measure to remedy the overestimation of the ordinary FDR-based approach. The paper by Y. Kim et al. performs heritability estimation by using population- and family-based samples. The main idea lies in utilizing genetic relationship matrix to parameterize the variance of a polygenic effect for population-based samples. Three other papers consider gene-gene and gene-environment analysis. First, J. Yee et al. proposed interaction analysis for quantitative traits using entropy. Although there have been several methods proposed for gene-gene interaction using entropy, this is a robust entropy-based gene-gene interaction analysis that does not necessarily require an assumption on the distribution of trait for quantitative traits. Second, S. Y. Lee et al. focused on identifying multi-SNP effects or gene-gene interactions for survival phenotypes. In the framework of the multifactor dimensionality reduction (MDR) method, several extensions for the survival phenotype are considered and compared to the earlier MDR method through comprehensive simulation studies. Third, the paper by H. Xu et al. proposes a new GWAS strategy for detecting gene-gene and gene-environment analysis by combining the generalized multifactor dimensionality reduction-graphics processing unit (GMDR-GPU) algorithm with mixed linear model approach. It was further employed to investigate the genetic architecture of important quality traits in rice. The reliability and efficiency of the model and analytical methods were verified through Monte Carlo simulations. The next two papers discuss multi-SNP analysis. Y. J. Yoo et al. propose a new multi-bin linear combination (MLC) test for multiple SNP analysis. It first performs clustering analysis to find cliques, complete subnetworks of SNPs with all pairwise correlations above a threshold, and then performs MLC test. Through simulation studies, the clique-based algorithm was shown to produce smaller clusters with stronger positive correlation than other MLC tests. The paper by S. Won et al. focuses on comparing penalized and nonpenalized methods for disease prediction with large-scale genetic data. It was shown that penalized regressions are usually robust and provide better accuracy than nonpenalized methods for disease prediction. Next, the work of J. Joo et al. considers robust genetic association tests for GWAS. How these robust tests can be applied to the replication study of GWAS and how the overall statistical significance can be evaluated using the combined test formed by p values of the discovery and replication studies were demonstrated. Finally, the paper by L. Li and M. Xiong proposes a dynamic model for RNA-seq data analysis. To extract biologically useful transcription process from the RNA-seq data, the ordinary differential equation (ODE) model was proposed for modeling the RNA-seq data. Differential principal analysis was developed for estimation of location-varying coefficients of the ODE. This special issue discusses the most challenging issues in multiple SNPs approaches including gene-gene interaction and introduces statistical and computational methods for data mining and machine learning for revealing hidden association network of genotype-phenotype relationship. The nine papers in this special issue provide scientists with an overview on the recent advancements in multiple SNP analysis for GWASs. We hope the papers can encourage researchers towards a more extensive use of statistical genetics and bioinformatics techniques for research in biology and medical sciences.

  • Research Article
  • Cite Count Icon 9
  • 10.1155/2015/135782
Detection of Epistatic and Gene-Environment Interactions Underlying Three Quality Traits in Rice Using High-Throughput Genome-Wide Data
  • Jan 1, 2015
  • BioMed Research International
  • Haiming Xu + 8 more

With development of sequencing technology, dense single nucleotide polymorphisms (SNPs) have been available, enabling uncovering genetic architecture of complex traits by genome-wide association study (GWAS). However, the current GWAS strategy usually ignores epistatic and gene-environment interactions due to absence of appropriate methodology and heavy computational burden. This study proposed a new GWAS strategy by combining the graphics processing unit- (GPU-) based generalized multifactor dimensionality reduction (GMDR) algorithm with mixed linear model approach. The reliability and efficiency of the analytical methods were verified through Monte Carlo simulations, suggesting that a population size of nearly 150 recombinant inbred lines (RILs) had a reasonable resolution for the scenarios considered. Further, a GWAS was conducted with the above two-step strategy to investigate the additive, epistatic, and gene-environment associations between 701,867 SNPs and three important quality traits, gelatinization temperature, amylose content, and gel consistency, in a RIL population with 138 individuals derived from super-hybrid rice Xieyou9308 in two environments. Four significant SNPs were identified with additive, epistatic, and gene-environment interaction effects. Our study showed that the mixed linear model approach combining with the GPU-based GMDR algorithm is a feasible strategy for implementing GWAS to uncover genetic architecture of crop complex traits.

  • Research Article
  • Cite Count Icon 5
  • 10.15177/seefor.14-17
Canopy Layers Stratified Volume Equations for Pinus caribaea Stands in South West Nigeria using Linear Mixed Models
  • Dec 13, 2014
  • South-east European forestry
  • Peter Adesoye

Background and Purpose: Efficient forest stand management requires reliable estimates of growing stock. The reliability of stem volume estimates depends on the range and extent of available sample data. The potentials of canopy layers stratification in pure plantations as a means of improving the accuracy of stem volume equations have not been fully explored. Linear Mixed Model (LMM) approach is a statistical technique capable of yielding a more efficient prediction under clustered data structure. This study investigates the existence and potentials of canopy stratifications for improving the reliability of stem volume prediction equations under pure plantations using linear mixed model approach. Materials and Methods: Pinus caribaea Morelet plantations in Oluwa Forest Reserve, Ondo State, Nigeria were investigated. Individual tree growth variables, including diameters, heights and crown measurements were obtained in 2010 on twenty-five 0.04 ha plots representing five different stands planted between 1979 and 1991. Visual assessment of the trees within each plot was also done to classify them into four canopy strata (i.e. dominant, co-dominant, intermediate and suppressed). Linear mixed model approach was used to analyze the tree growth data using SAS Proc Mixed. Two variants of volume equations; simple linear and exponential were investigated. Results: Results show that simple linear mixed model consistently give better fit criteria (e.g. AIC) of 135.8, 18.9, -214.7 and -174.6 under dominant, co-dominant, intermediate and suppressed canopy layers, respectively. The covariance parameter estimate for dominant canopy (0.2219) is about 370 as large as that of suppressed (0.0006). This implies that canopy layers not only influence stem volume prediction but also reduce within-stand variance as well.

  • Research Article
  • Cite Count Icon 1
  • 10.1007/s11032-025-01550-8
Genomic selection in a kiwiberry breeding programme: integrating intra- and inter-specific crossing
  • Mar 1, 2025
  • Molecular Breeding
  • Daniel Mertten + 6 more

Inter-specific hybridisation between natural populations within the genus Actinidia is a common phenomenon and has been used in breeding programmes. Hybridisation between species increases the diversity of breeding populations, incorporating new desirable traits into potential cultivars. We explored genomic prediction in Actinidia breeding, focusing on the closely related species Actinidia arguta and Actinidia melanandra. We investigated the potential of genomic selection by analysing four quantitative traits across intra-specific A. arguta crosses and inter-specific crosses between A. arguta and A. melanandra. The continuous distributions of the studied traits in both intra-specific and inter-specific crosses indicated a polygenic background. A linear mixed model approach was used, incorporating the factor of year of season and a marker-based relationship matrix instead of pedigree as a random effect. After evaluation, the best model was applied to assess variance components and heritability for each quantitative trait. Expanding beyond intra-specific crosses, predictive ability was calculated to investigate inter-specific cross effect. Considering predictive ability, this study explored the impacts of sample size and population structure. A reduction in sample size correlated with decreased predictive ability, while the influence of population structure was particularly pronounced in inter-specific crosses. Finally, the prediction accuracy of genomic estimated breeding values, for parental genotypes, revealed an inter-species effect on prediction confidence. Considering the imbalance in genotype numbers between intra- and inter-specific cross populations, this research highlights the difficulty of genomic prediction in hybrid populations. Understanding prediction accuracy in inter-species crossing designs provides valuable insights for optimising genomic selection.

  • Front Matter
  • Cite Count Icon 5
  • 10.1016/j.gde.2013.11.010
Systems biology and the analysis of genetic variation
  • Nov 28, 2013
  • Current Opinion in Genetics & Development
  • Shamil R Sunyaev + 1 more

Systems biology and the analysis of genetic variation

  • Peer Review Report
  • 10.7554/elife.82459.sa2
Author response: Genetic architecture of natural variation of cardiac performance from flies to humans
  • Oct 11, 2022
  • Saswati Saha + 14 more

Author response: Genetic architecture of natural variation of cardiac performance from flies to humans

  • Peer Review Report
  • 10.7554/elife.82459.sa1
Decision letter: Genetic architecture of natural variation of cardiac performance from flies to humans
  • Sep 29, 2022
  • Detlef Weigel

Article Figures and data Abstract Editor's evaluation Introduction Results Discussion Materials and methods Data availability References Decision letter Author response Article and author information Metrics Abstract Deciphering the genetic architecture of human cardiac disorders is of fundamental importance but their underlying complexity is a major hurdle. We investigated the natural variation of cardiac performance in the sequenced inbred lines of the Drosophila Genetic Reference Panel (DGRP). Genome-wide associations studies (GWAS) identified genetic networks associated with natural variation of cardiac traits which were used to gain insights as to the molecular and cellular processes affected. Non-coding variants that we identified were used to map potential regulatory non-coding regions, which in turn were employed to predict transcription factors (TFs) binding sites. Cognate TFs, many of which themselves bear polymorphisms associated with variations of cardiac performance, were also validated by heart-specific knockdown. Additionally, we showed that the natural variations associated with variability in cardiac performance affect a set of genes overlapping those associated with average traits but through different variants in the same genes. Furthermore, we showed that phenotypic variability was also associated with natural variation of gene regulatory networks. More importantly, we documented correlations between genes associated with cardiac phenotypes in both flies and humans, which supports a conserved genetic architecture regulating adult cardiac function from arthropods to mammals. Specifically, roles for PAX9 and EGR2 in the regulation of the cardiac rhythm were established in both models, illustrating that the characteristics of natural variations in cardiac function identified in Drosophila can accelerate discovery in humans. Editor's evaluation The authors investigated natural variation and new genetic mechanisms underlying cardiac performance using sequenced inbred lines of the Drosophila Genetic Reference Panel. The study provides insights into the genetic architecture of complex cardiac performance traits and represents an important resource for researchers studying cardiac performance. https://doi.org/10.7554/eLife.82459.sa0 Decision letter eLife's review process Introduction Heart diseases is a major cause of mortality (Bezzina et al., 2015). Although a large number of genome-wide association studies (GWAS) have identified hundreds of genetic variants related to cardiovascular traits (Roselli et al., 2018; van Setten et al., 2018; Shah et al., 2020; Verweij et al., 2020), we are very far from a comprehensive understanding of the genetic architecture of these complex traits. Deciphering the impact of genetic variations on quantitative traits is however critical for the prediction of disease risk. But disentangling the relative genetic and environmental contributions to pathologies is challenging due to the difficulty in accounting for environmental influences and disease comorbidities. Underlying epistatic interactions may also contribute to problems with replication in human GWAS performed in distinct populations which rarely take epistatic effects into account. In addition, linking a trait associated locus to a candidate gene or a set of genes for prioritization is not straightforward (Mackay, 2014, Boyle et al., 2017). Furthermore, the analysis of genetic factors related to cardiac traits is complicated by their interactions with several risk factors, such as increasing age, hypertension, diabetes mellitus, ischemic, and structural heart disease (Paludan-Müller et al., 2016). These pitfalls can be overcome using animal models. Model organisms allow precise controlling of the genetic background and environmental rearing conditions. They can provide generally applicable insights into the genetic underpinnings of complex traits and human diseases, due to the evolutionary conservation of biological pathways. Numerous studies have highlighted the conservation of cardiac development and function from flies to mammals. Indeed, orthologous genes control the early development as well as the essential functional elements of the heart. The fly is the simplest genetic model with a heart muscle and is increasingly used to identify the genes involved in heart disease and aging (Ocorr et al., 2007b; Diop and Bodmer, 2015; Rosenthal et al., 2010). Although a large number of genes are implicated in establishing and maintaining cardiac function in Drosophila (Neely et al., 2010), the extent to which genes identified from mutant analysis reflect naturally occurring variants is neither known, nor do we know how allelic variants at several segregating loci combine to affect cardiac performance. We previously showed that wild populations of flies harbor rare polymorphisms of major effects that predispose them to cardiac dysfunction (Ocorr et al., 2007a). Here, we analyzed the genetic architecture of the natural variation of cardiac performance in Drosophila. Our aims were to (i) identify the variants associated with cardiac traits found in a natural population, (ii) decipher how these variants interact with each other and with the environment to impact cardiac performance, and (iii) gain insights into the molecular and cellular processes affected. For this, we used the Drosophila Genetic Reference Panel (DGRP) (Mackay et al., 2012; Huang et al., 2014), a community resource of sequenced inbred lines. Previous GWAS performed in the DGRP indicate that inheritance of most quantitative traits in Drosophila is complex, involving many genes with small additive effects as well as epistatic interactions (Mackay and Huang, 2018). The use of inbred lines allows us to assess the effects of genetic variations in distinct but constant genetic backgrounds and discriminate genetic and environmental effects. We demonstrated substantial among-lines variations of cardiac performance and identified genetic variants associated with the cardiac traits together with epistatic interactions among polymorphisms. Candidate loci were enriched for genes encoding transcription factors (TFs) and signaling pathways, which we validated in vivo. We used non-coding variants - which represented the vast majority of identified polymorphisms – for predicting transcriptional regulators of associated genes. Corresponding TFs were further validated in vivo by heart-specific RNAi-mediated knockdown (KD). This illustrates that natural variations of gene regulatory networks have widespread impact on cardiac function. In addition, we analyzed the phenotypic variability of cardiac traits between individuals within each of the DGRP lines (i.e., with the same genotype), and we documented significant diversity in phenotypic variability among the DGRP lines, suggesting genetic variations influenced phenotypic variability of cardiac performance. Genetic variants associated with this phenotypic variability were identified and shown to affect a set of genes that overlapped with those associated with trait means, although through different genetic variants in the same genes. Comparison of human GWAS of cardiac disorders with results in flies identified a set of orthologous genes associated with cardiac traits both in Drosophila and in humans, supporting the conservation of the genetic architecture of cardiac performance traits, from arthropods to mammals. siRNA-mediated gene KD were performed in human induced pluripotent stem cells derived cardiomyocytes (hiPSC-CMs) to indeed show that dmPox-meso/hPAX9 and dmStripe/hEGR2 have conserved functions in cardiomyocytes from both flies and humans. These new insights into the fly’s genetic architecture and the connections between natural variations and cardiac performance permit the accelerated identification of essential cardiac genes and pathways in humans. Results Quantitative genetics of heart performances in the DGRP In this study, we aimed to evaluate how naturally occurring genetic variations affect cardiac performance in young Drosophila adults and identify variants and genes involved in the genetic architecture of cardiac traits. To assess the magnitude of naturally occurring variations of the traits, we measured heart parameters in 1-week-old females for 167 lines from the DGRP, a publicly available population of sequenced inbred lines derived from wild caught females (Figure 1A). Briefly, semi-intact preparations of individual flies (Ocorr et al., 2007c) were used for high-speed video recording combined with Semi-automated Heartbeat Analysis (SOHA) software (http://www.sohasoftware.com/) which allows precise quantification of a number of cardiac function parameters (Fink et al., 2009; Cammarato et al., 2015). Fly cardiac function parameters are highly influenced by sex (Wessells et al., 2004). Due to the experimental burden of analyzing individual cardiac phenotypes, we focused on female flies only and designed our experiment in the following way: we randomly selected 14 lines out of 167 that were replicated twice. The remaining 153 lines were replicated once. Each replicate was composed of 12 individuals. No block effect was observed due to the replicates in the 14 selected lines (see Supplementary file 1a). This allowed us to perform our final analysis on one replicate of each of the 167 lines. A total sample of 1956 individuals was analyzed. Seven cardiac traits were analyzed across the whole population (Figure 1—source data 1 and Table 1). As illustrated in Figure 1A, we analyzed phenotypes related to the rhythmicity of cardiac function: the systolic interval (SI) is the time elapsed between the beginning and the end of one contraction, the diastolic interval (DI) is the time elapsed between two contractions and the heart period (HP) is the duration of a total cycle (contraction+relaxation (DI+SI)). The arrhythmia index (AI, std-dev(HP)/mean (HP)) is used to evaluate the variability of the cardiac rhythm. In addition, three traits related to contractility were measured. The diameters of the heart in diastole (end diastolic diameter [EDD]), in systole (end systolic diameter [ESD]), and the fractional shortening (FS), which measures the contraction efficacy (EDD −ESD/EDD). We found significant genetic variation for all traits (Figure 1B and Figure 1—figure supplement 1) with broad sense heritability ranging from 0.30 (AI) to 0.56 (EDD) (Table 1). Except for EDD/ESD and HP/DI, quantitative traits were poorly correlated with each other (Figure 1—figure supplement 1). Figure 1 with 2 supplements see all Download asset Open asset Quantitative genetics and genome-wide associations studies (GWAS) for cardiac traits in the Drosophila Genetic Reference Panel (DGRP). (A) Left: Cardiac performance traits were analyzed in 167 sequenced inbred lines from the DGRP population. Approximately 12 females per line were analyzed. Right panels: Schematic of the Drosophila adult heart assay and example of M-mode generated from video recording of a beating fly heart. Semi-intact preparations of 1-week-old adult females were used for high-speed video recording followed by automated and quantitative assessment of heart size and function. The representative M-mode trace illustrate the cardiac traits analyzed. DI: diastolic interval; SI: systolic interval; HP: heart period (duration of one heartbeat); EDD: end diastolic diameter (fully relaxed cardiac tube diameter); ESD: end systolic diameter (fully contracted cardiac tube diameter). Fractional shortening (FS=EDD − ESD/EDD) and arrythmia index (AI=Std Dev (HP)/HP) were additionally calculated and analyzed. (B) Distribution of line means and within lines variations (box plots) from 167 measured DGRP lines for HP and EDD. DGRP lines are ranked by their increasing mean phenotypic values. For both phenotypes, representative M-modes from extreme lines are shown below (other traits are displayed in Figure 1—figure supplement 1). (C) Pearson residuals of chi-square test from the comparison of indicated single nucleotide polymorphism (SNP) categories in the DGRP and among variants associated with cardiac traits. According to DGRP annotations, SNPs are attributed to genes if they are within the gene transcription unit (5’ and 3’ UTR, synonymous and non-synonymous coding, introns) or within 1 kb from transcription start and end sites (1 kb upstream, 1 kb downstream). NA: SNPs not attributed to genes (>1 kb from transcription start site [TSS] and transcription end sites [TES]). (D) Comparison of gene sets identified by single marker using Fast-LMM (LMM) and in interaction using FastEpistasis (Epistasis). The Venn diagram illustrates the size of the two populations and their overlap. (E) Overlap coefficient of gene sets associated with the different cardiac traits analyzed. Figure 1—source data 1 Individual values for cardiac traits analyzed across the 167 Drosophila Genetic Reference Panel (DGRP) lines. Individual and DGRP line number are indicated. Phenotypic values were determined from high-speed video recording on dissected flies and movie analysis using Semi-automated Heartbeat Analysis (SOHA) (Mackay et al., 2012). https://cdn.elifesciences.org/articles/82459/elife-82459-fig1-data1-v1.xlsx Download elife-82459-fig1-data1-v1.xlsx Figure 1—source data 2 Variants identified by FastLMM as associated to indicated phenotypes. Among the 100 best ranked associations, only variants with MAF >4% were retained. Tables for variants mapped to genes and for variants that are not within gene mapping criteria (>1 kb from transcription start site [TSS] and transcription end sites [TES]) are indicated. https://cdn.elifesciences.org/articles/82459/elife-82459-fig1-data2-v1.xlsx Download elife-82459-fig1-data2-v1.xlsx Figure 1—source data 3 All FastEpistasis data on mean phenotypes, per quantitative trait. Single nucleotide polymorphism (SNP) ID, position, associated genes, and statistics are indicated for both focal SNPs (left) and their interacting SNPs (right). Each sheet displays the results for indicated quantitative traits, except for the first one which is a merge of all quantitative traits association analyses. https://cdn.elifesciences.org/articles/82459/elife-82459-fig1-data3-v1.xlsx Download elife-82459-fig1-data3-v1.xlsx Table 1 Quantitative genetics of cardiac traits in the Drosophila Genetic Reference Panel (DGRP). Summary statistics over all DGRP genotypes assayed. Number of lines and individuals (after outlier removal, see Materials and methods) analyzed for each cardiac trait is indicated. Mean, standard deviation (Std dev.), and coefficient of variation (Coef. Var) among the whole population are indicated. Genetic, environment, and phenotypic variance (respectively Genet. var, Env. var, and Phen. var) were calculated for each trait. Broad sense heritability of traits means (H2) suggested heritability of corresponding traits. Levene test indicated significant heterogeneity of the variance among the lines. DiastolicintervalsSystolicintervalsHeartperiodDiastolic DiameterSystolic diameterFractional shorteningArrhythmia Indextotal.nb.lines167167167167167167167mean0.46380.21660.688379.420051.05000.35380.2475Std dev.0.263300.032160.2769014.090009.493000.068370.29230Coef. var0.56770.14850.40220.17740.18600.19331.1810lines (mean)165166165159157158166Indiv. (mean)1914191119201779175317671832lines (Cve)165166165159157158166Indiv. (Cve)1914191119201779175317671832Genet. var2.59e-025.03e-042.87e-021.13e+024.39e+011.57e-032.21e-02Env. var4.36e-025.35e-044.82e-028.64e+014.65e+013.11e-036.35e-02Phen. var6.95e-021.04e-037.68e-021.99e+029.04e+014.68e-038.56e-02H20.3730.4850.3730.5660.4850.3350.258F value76,86411,68674,71546,95015,04111,16465,308Pr(F)8.8e-1202.3e-1875.8e-1167.1e-621.9e-2318.8e-1751.8e-96Levene test1.9e-101.9e-101.7e-081.6e-052.1e-131.6e-052.1e-13 GWAS analyses of heart performance To identify candidate variants associated with cardiac performance variation, we performed GWAS analyses and evaluated single marker associations of line means with common variants using a linear mixed model (Lippert et al., 2011) and after accounting for effects of Wolbachia infection and common polymorphism inversions (see Materials and methods). Genotype-phenotype associations were performed separately for all seven quantitative traits and variants were ranked based on their p-values. For most of the phenotypes analyzed, quantile-quantile (QQ) plots were uniform (Figure 1—figure supplement 2) and none of the variants reached the strict Bonferroni correction threshold for multiple tests (2 · 10–8), which is usual in the DGRP given the size of the population. However, the decisive advantage of the Drosophila system is that we can use GWA analyses as primary screens for candidate genes and mechanisms that can be subsequently validated by different means. We therefore chose to analyze the 100 top ranked variants for each quantitative trait. This choice is based on our strategy to test the selected single nucleotide polymorphisms (SNPs) and associated genes by a variety of approaches – data mining and experimental validation (see below) – in order to provide a global validation of association results and to gain insights into the characteristics of the genetic architecture of the cardiac traits. This cut-off was chosen in order to be able to test a significant number of variants while being globally similar to the nominal cut-off (10–5) generally used in DGRP analyses. A large proportion of the variants retained have indeed a p-value below 10–5. Selected variants were further filtered on the basis of minor allele frequency (MAF >4%) (Figure 1—source data 2, Supplementary file 1b). Among the seven quantitative traits analyzed, we identified 530 unique variants. These variants were associated to genes if they were within 1 kb of transcription start site (TSS) or transcription end sites (TES). Using these criteria, 417 variants were mapped to 332 genes (Supplementary file 1c). We performed a chi-squared test to determine if the genomic location of variants associated with cardiac traits is biased toward any particular genomic region when compared with the whole set of variants with MAF >4% in the DGRP population and obtained a p-value of 2.778e-13. Genomic locations of the variants were biased toward regions within 1 kb upstream of genes TSS, and, to a lesser extent, to genes 5’ UTR (Figure 1C). Variants not mapped to genes (located at >1 kb from TSS or TES) were slightly depleted in our set. In GWAS analyses, loci associated with a complex trait collectively account for only a small proportion of the observed genetic variation (Manolio et al., 2009) and part of this ‘missing heritability’ is thought to come from interactions between variants (Flint and Mackay, 2009; Manolio et al., 2009; Huang et al., 2012; Shorter et al., 2015). As a first step toward identifying such interactions, we used FastEpistasis (Schüpbach et al., 2010). SNP identified by GWAS were used as focal SNPs and were tested for interactions with all other SNPs in the DGRP. FastEpistasis reports best ranked interacting SNP for each starting focal SNP, thus extending the network of variants and genes associated to natural variation of cardiac performance, which were used for hypothesis generation and functional validations; 288 unique SNPs were identified, which were mapped to 261 genes (Figure 1—source data 3, Supplementary file 1e). While none of the focal SNPs interacted with each other, there is a significant overlap between the 332 genes associated with single marker GWAS and the 261 genes identified by epistasis (n=31, Figure 1D and Supplementary file 1e, fold change (FC)=6; hypergeometric pval=6.8 × 10–16). This illustrates that the genes that contribute to quantitative variations in cardiac performance have a tendency to interact with each other, although through distinct alleles. Taken together, single marker GWAS and epistatic interactions performed on the seven cardiac phenotypes identified a compendium of 562 genes associated with natural variations of heart performance (Supplementary file 1f). In line with the correlation noted between their phenotypes (Figure 1—figure supplement 1B), the GWAS for HP and DI identified partially overlapping gene sets (overlap index 0.23, Figure 1E). The same was true, to a lesser extent, for ESD and EDD (0.15). Otherwise, the sets of genes associated with each of the cardiac traits are poorly correlated with each other. Functional annotations and network analyses of association results Our next objective was to identify the biological processes potentially affected by natural variation in cardiac performance. Gene Ontology (GO) enrichment analysis of the combined single marker GWAS and epistatic interactions analyses indicated that genes encoding signaling receptors, TFs, and cell adhesion molecules were over-represented among these gene sets (pval=1.4 × 10–9 [FC=2.9], 5×10–4 [FC=2], and 3×10–3 [FC=4.6], respectively). There was also a bias for genes encoding proteins located at the plasma membrane, at ion channel complexes as well as components of contractile fibers (pval=3.4 × 10–10 [FC=3], 7×10–5 [FC=4.2], and 4×10–2 [FC=3.6]; Figure 2A; Supplementary file 2a). Of note, although a number of genes have previously been identified as being required during heart development or for the establishment and maintenance of cardiac function by single gene approaches, we found no enrichment for these gene categories in our analysis. In addition, genes identified in a global screen for stress-induced lethality following heart-specific RNAi KD (Neely et al., 2010) were also not enriched in GWAS detected genes (FC=1; Supplementary file 2b). This indicates that genes associated to natural variations of cardiac traits are typically missed by traditional forward or reverse genetics approaches, which highlights the value of our approach. Figure 2 with 2 supplements see all Download asset Open asset Functional annotations and validations of genes associated with genome-wide associations studies (GWAS) for cardiac performance. (A) Gene Ontology (GO) enrichment analyses. Selected molecular functions (MF, left) and cellular components (CC, right) associated with cardiac performances at FDR < 0.05 are shown. Enrichment analysis was performed using G:profiler with a correction for multitesting (see Materials and methods). (B) Interaction network of genes associated with natural variations of cardiac performance. Direct genetic and interactions between cardiac fly GWAS genes are genes interactions to single marker epistasis to the cardiac performances for which associations were and proteins highlighted in to transcription factors, in and to signaling pathways and and in to ion (C) the effects on indicated cardiac traits of heart-specific RNAi-mediated knockdown of genes identified in GWAS for cardiac performance. Results of test of the effects of indicated heart-specific RNAi-mediated gene KD for cardiac performance traits analyzed on semi-intact females data are in Figure data of genes tested to significant effects on cardiac performance traits indicate the for which the corresponding gene was associated in not were for multiple using Bonferroni Comparison with heart-specific effect of selected genes is displayed in Figure supplement (D) Schematic of and pathways in Drosophila. (E) Genetic interactions between and genes. Genetic interactions tested between and for and between and for (other phenotypes are shown in Figure supplement Cardiac traits were measured on each single and on that the interaction between and for and between and for are data for interaction effect corresponding to all phenotypes are displayed in Figure supplement Figure data 1 of and genetic interactions identified in Drosophila. Download Figure data 2 Data from validation RNAi validations and tests for genetic interactions among Download In order to gain the cellular and molecular pathways affected by natural variations of cardiac traits, we have mapped the associated genes and gene interaction networks. Of the 562 identified genes, were mapped to the fly that both and genetic interactions from both et al., 2011) and (see Materials and methods and Figure data 1). a proportion of the GWAS identified genes were within the fly and a large network of interacting (Supplementary file and Figure suggesting that they in common biological This network several TFs and ion channel complex genes, with their potential in the genetic architecture of natural variation of heart performance. components of signaling pathways are also in the of the and pathways (see Functional validations of candidate genes To assess in an in genes SNPs associated with variation in cardiac traits to these phenotypes, we selected GWAS associated genes for RNAi KD and tested the effects on cardiac performance. We selected genes that were identified in at two GWAS for two traits or that were to be in the adult heart et al., and for which RNAi lines were were tested in 1-week-old adult female using the heart-specific et al., and the same semi-intact heart preparations and analyses as for DGRP lines of the selected genes to of cardiac performance following heart-specific KD (Figure In we tested the effect on cardiac performance of genes randomly selected in the – the GWAS associated genes being from the (see Materials and methods and Figure supplement 1). Although a number of these genes to cardiac phenotypes when – which is with that quantitative traits can be influenced by a large number of genes et al., – when in the the genes selected from GWAS to phenotypes compared to the randomly selected genes (Figure supplement 1). These results therefore our association is important to that our is to the effects of gene of the variants may to gene function this can to a that is to In addition, of the associated variants may heart function by which not be replicated by RNAi We further focused on the of both and pathways were identified in our analyses. We tested different of the for cardiac phenotypes using RNAi KD (Figure and the of the and the and pathways are (Figure identification in our GWAS that they in a to heart function. may reflect their in different of cardiac development functional In order to discriminate between these two we tested if different components of these pathways interacted Single for of function show effects of and on several phenotypes, an of their in several cardiac traits (Figure supplement compared to each single flies showed phenotypes · suggesting a genetic interaction (Figure and Figure supplement is however that is also a transcriptional of the et al., The effect observed in can therefore as a of an signaling the We thus tested other allelic for of function of and pathways.

  • Peer Review Report
  • 10.7554/elife.82459.sa0
Editor's evaluation: Genetic architecture of natural variation of cardiac performance from flies to humans
  • Sep 29, 2022
  • Detlef Weigel

Editor's evaluation: Genetic architecture of natural variation of cardiac performance from flies to humans

  • Research Article
  • Cite Count Icon 35
  • 10.1186/s12864-018-4837-0
Dissection of complicate genetic architecture and breeding perspective of cottonseed traits by genome-wide association study
  • Jun 13, 2018
  • BMC Genomics
  • Xiongming Du + 21 more

BackgroundCottonseed is one of the most important raw materials for plant protein, oil and alternative biofuel for diesel engines. Understanding the complex genetic basis of cottonseed traits is requisite for achieving efficient genetic improvement of the traits. However, it is not yet clear about their genetic architecture in genomic level. GWAS has been an effective way to explore genetic basis of quantitative traits in human and many crops. This study aims to dissect genetic mechanism seven cottonseed traits by a GWAS for genetic improvement.ResultsA genome-wide association study (GWAS) based on a full gene model with gene effects as fixed and gene-environment interaction as random, was conducted for protein, oil and 5 fatty acids using 316 accessions and ~ 390 K SNPs. Totally, 124 significant quantitative trait SNPs (QTSs), consisting of 16, 21, 87 for protein, oil and fatty acids (palmitic, linoleic, oleic, myristic, stearic), respectively, were identified and the broad-sense heritability was estimated from 71.62 to 93.43%; no QTS-environment interaction was detected for the protein, the palmitic and the oleic contents; the protein content was predominantly controlled by epistatic effects accounting for 65.18% of the total variation, but the oil content and the fatty acids except the palmitic were mainly determined by gene main effects and no epistasis was detected for the myristic and the stearic. Prediction of superior pure line and hybrid revealed the potential of the QTSs in the improvement of cottonseed traits, and the hybrid could achieve higher or lower genetic values compared with pure lines.ConclusionsThis study revealed complex genetic architecture of seven cottonseed traits at whole genome-wide by mixed linear model approach; the identified genetic variants and estimated genetic component effects of gene, gene-gene and gene-environment interaction provide cotton geneticist or breeders new knowledge on the genetic mechanism of the traits and the potential molecular breeding design strategy.

  • Research Article
  • Cite Count Icon 54
  • 10.1371/journal.pbio.1001008
The Importance of Synthetic Associations Will Only Be Resolved Empirically
  • Jan 18, 2011
  • PLoS Biology
  • David B Goldstein

The Importance of Synthetic Associations Will Only Be Resolved Empirically

Save Icon
Up Arrow
Open/Close