Abstract

Article Figures and data Abstract Editor's evaluation Introduction Results Discussion Methods Data availability References Decision letter Author response Article and author information Metrics Abstract Adaptive ecotype formation can be the first step to speciation, but the genetic underpinnings of this process are poorly understood. Marine midges of the genus Clunio (Diptera) have recolonized Northern European shore areas after the last glaciation. In response to local tide conditions they have formed different ecotypes with respect to timing of adult emergence, oviposition behavior and larval habitat. Genomic analysis confirms the recent establishment of these ecotypes, reflected in massive haplotype sharing between ecotypes, irrespective of whether there is ongoing gene flow or geographic isolation. QTL mapping and genome screens reveal patterns of polygenic adaptation from standing genetic variation. Ecotype-associated loci prominently include circadian clock genes, as well as genes affecting sensory perception and nervous system development, hinting to a central role of these processes in ecotype formation. Our data show that adaptive ecotype formation can occur rapidly, with ongoing gene flow and largely based on a re-assortment of existing alleles. Editor's evaluation This valuable study combines phenotypic analysis, quantitative genetics and population genomics to provide solid evidence for multiple genes underlying adaptive divergence in a marine midge system linked to tidal rhythm. Genes with a plausible role in perceiving and responding to lunar information are among the loci that most highly differentiate populations with distinct behaviors. https://doi.org/10.7554/eLife.82824.sa0 Decision letter Reviews on Sciety eLife's review process Introduction The genetic foundations for rapid ecological adaptation and speciation remain poorly understood. Decades of research have focused on identifying single or few major genetic loci in such contexts, with some notable successes (Wu and Ting, 2004; Phadnis and Orr, 2009; Nosil and Schluter, 2011; Van’t Hof et al., 2016). However, many genome-wide association studies have now shown that quantitative trait phenotypes have usually a polygenic basis (Sella and Barton, 2019; Boyle et al., 2017). This implies that also rapid natural adaptation should be expected to occur in a polygenic context, based on subtle allele frequency changes at many loci with small effects (Barton, 2022; Barghi et al., 2020; Le Corre and Kremer, 2012). A classic example for polygenic adaptation is human height (Turchin et al., 2012; Robinson et al., 2015), and other examples from humans may include birth weight and female hip size (Field et al., 2016). However, powerful datasets to detect polygenic adaptations are still mostly limited to humans and agriculturally relevant species (Barghi et al., 2020), and examples of polygenic adaptation in natural populations are rare. To show that polygenic adaptation occurs in natural populations requires an exceptional biogeographic setting, as well as deep genomic and genetic analysis. One such example may be found in Midas cichlids, where a polygenic trait architecture was found to promote rapid and stable sympatric speciation (Kautt et al., 2020). This finding is particularly interesting, since theoretical work suggested that under scenarios with gene flow the genetic architecture of adaptation may – compared to adaptation without gene flow – rather consist of fewer, tightly linked loci with larger effects (Yeaman and Whitlock, 2011). Notably, the same study also suggested that the resulting large-effect QTL may often be composed of many tightly linked loci of smaller effects (Yeaman and Whitlock, 2011). Without migration we expect an exponential distribution of allele effects, with few large effect loci and many loci of small effects (Barghi et al., 2020; Orr, 1998). In this study, we examine local adaptation and population divergence in another exceptional biogeographic setting, namely the postglacial evolution of new ecotypes in marine midges of the genus Clunio (Diptera: Chironomidae). In adaptation to their habitat, Clunio midges differ in oviposition behavior and reproductive timing, involving both circadian and circalunar clocks. Circalunar clocks are biological time-keeping mechanisms that allow organisms to anticipate lunar phase (Neumann, 2014). Their molecular basis is unknown (Andreatta and Tessmar-Raible, 2020), making identification of adaptive loci for lunar timing both particularly challenging and interesting. In many marine organisms, circalunar clocks synchronize reproduction within a population. In Clunio marinus they have additional ecological relevance (Kaiser, 2014). Living in the intertidal zone of the European Atlantic coasts, C. marinus requires the habitat to be exposed by the tide for successful oviposition. The habitat is maximally exposed during the low waters of spring tide days around full moon and new moon. Adult emergence is restricted to these occasions by a circalunar clock, which tightly regulates development and maturation. Additionally, a circadian clock ensures emergence during only one of the two daily low tides. The adults reproduce immediately after emergence and die few hours later. As tidal regimes vary dramatically along the coastline, populations of this Atlantic ecotype of C. marinus show local timing adaptations (Kaiser, 2014; Neumann, 1967; Kaiser et al., 2011) for which the genetic underpinnings are partially known (Kaiser et al., 2016; Kaiser and Heckel, 2012). When Clunio expanded into the North after the regression of the ice shield, new ecotypes evolved in the Baltic Sea (Remmert, 1955; Palmén and Lindeberg, 1959) and in the high Arctic (Neumann and Honegger, 1969; Pflüger and Neumann, 1971), which differ from the Atlantic ecotype of C. marinus in various ways (see Figure 1 for a summary of defining characteristics of the three ecotypes). In the Baltic Sea the tides are negligible and the Baltic ecotype oviposits on the open water, from where the eggs quickly sink to the submerged larval habitat at water depths of up to 20 m (Remmert, 1955; Endraß, 1976a). Reproduction of the Baltic ecotype happens every day precisely at dusk under control of a circadian clock (Heimbach, 1978), without a detectable circalunar rhythm (Endraß, 1976a). Near Bergen (Norway) the Baltic and Atlantic ecotypes were reported to co-occur in sympatry, but in temporal reproductive isolation. The Baltic ecotype reproduces at dusk, the Atlantic ecotype reproduces during the afternoon low tide (Heimbach, 1978). Therefore, the Baltic ecotype is currently considered a separate species – C. balticus. However, C. balticus and C. marinus can be successfully interbred in the laboratory (Heimbach, 1978). Figure 1 with 1 supplement see all Download asset Open asset Northern European ecotypes of Clunio and their circalunar rhythms. The Atlantic, Arctic and Baltic ecotypes of Clunio differ mainly in their circalunar rhythms (B–H), circadian rhythms (Figure 1—figure supplement 1), as well as their habitat and the resulting oviposition behavior (see ‘Assessment of oviposition behavior’ in Methods section). (A) Sampling sites for this study. (B–H) Circalunar rhythms of adult emergence in corresponding laboratory strains under common garden conditions, with 16 hours of daylight and simulated tidal turbulence cycles to synchronize the lunar rhythm. In Arctic and Baltic ecotypes the lunar rhythm is absent (E,G,H) or very weak (F). Por-1SL: n=1,263; He-1SL: n=2,075; Ber-1SL: n=230; Tro-tAR: n=209; Ber-2AR: n=399; Seh-2AR: n=380; Ar-2AR: n=765. In the high Arctic there are normal tides and the Arctic ecotype of C. marinus is found in intertidal habitats (Neumann and Honegger, 1969). During its reproductive season, the permanent light of polar day precludes synchronization of the circadian and circalunar clocks with the environment. Thus, the Arctic ecotype relies on a so-called tidal hourglass timer, which allows it to emerge and reproduce during every low tide (Pflüger and Neumann, 1971). It does not show circalunar or circadian rhythms (Pflüger and Neumann, 1971). The geological history of Northern Europe (Patton et al., 2017) implies the Baltic Sea and the high Arctic could only be colonized by Clunio after the last glacial ice shield regressed about 10,000 years ago. This inference is also supported by subfossil Clunio head capsules in Baltic Sea sediment cores (Hofmann and Winn, 2000). The new adaptations to the absence of tides in the Baltic Sea and to polar day in the high Arctic must therefore have occurred within this time frame. One strength of our model system is that we know the major adaptations specific to each ecotype, that is we know the major axes of recent selection. This makes Clunio a particularly interesting model system for studying the genetics of rapid adaptation. At the same time it promises to yield insights into the genetic pathways underlying the ecotype characteristics, including circadian and circalunar timing. Results Clunio ecotypes Starting from field work in Northern Europe (Figure 1A), we established one laboratory strain of the Arctic ecotype from Tromsø (Norway, Tro-tAR; see methods for strain nomenclature) and three laboratory strains of the Baltic ecotype, from Bergen (Norway, Ber-2AR), Sehlendorf (Germany; Seh-2AR) and Ar (Sweden; Ar-2AR). We also established a strain of the Atlantic ecotype from Bergen (Ber-1SL, sympatric with Ber-2AR) and used two existing Atlantic ecotype laboratory strains from Helgoland (Germany; He-1SL) and Port-en-Bessin (France; Por-1SL). We confirmed the identity of the ecotypes in the laboratory by the absence of a lunar rhythm in the Baltic and Arctic ecotypes (Figure 1B–H), their circadian rhythm (Figure 1—figure supplement 1B–H) and their oviposition behavior (for details see methods section). The Baltic ecotype from Bergen (Ber-2AR, Figure 1F) was found weakly lunar-rhythmic. Evolutionary history and species status We sequenced the full nuclear and mitochondrial genomes of 168 field-caught individuals from six geographic sites, including the two sympatric population ecotypes from Bergen (Figure 1). Based on a set of 792,032 single nucleotide polymorphisms (SNPs), we first investigated population structure and evolutionary history by performing a principal component analysis (PCA; Figure 2A–B) and testing for genetic admixture (Figure 2C). We also constructed a haplotype network of complete mitochondrial genomes (Figure 2D). There are several major observations that can be derived from these data. Figure 2 with 7 supplements see all Download asset Open asset Genetic structure and evolutionary history of Northern European Clunio ecotypes. The analysis is based on 24 individuals from each population (23 for Por-1SL, 25 for He-1SL). (A, B) Principal Component Analysis (PCA) based on 792,032 SNPs separates populations by geographic location rather than ecotype. (C) ADMIXTURE analysis supports strong differentiation by geographic site (best K=6), but a notable genetic component from the Baltic Sea in the Bergen populations (see K=2 and 3). The Bergen populations are only separated at K=7 and then show a number of admixed individuals. (D) Haplotype network of full mitochondrial genomes reveals highly divergent clusters according to geographic site, but haplotype sharing between Ber-1SL and Ber-2AR. (E) Correlated allele frequencies indicate introgression from Seh-2AR into Ber-2AR. First, there is strong geographic isolation between populations from different sites. In PCA, clusters are formed according to geography (Figure 2A–B, Figure 2—figure supplement 1). Mitochondrial haplotypes are not shared and are diagnostically different between geographic sites (Figure 2D). In ADMIXTURE, the optimal number of genetic groups is six (Figure 2C, Figure 2—figure supplement 2), corresponding to the number of geographic sites, and there is basically no mixing between the six clusters (Figure 2C; K=6). Finally, there is isolation by distance (IBD; Figure 2—figure supplement 3). Second, the sympatric ecotypes in Bergen are genetically very similar. In PCA they are not separated in the first four principal components (Figure 2A–B) and they are the only populations that share mitochondrial haplotypes (Figure 2D). In the ADMIXTURE analysis, they are only distinguished at K=7, a value larger than the optimal K. As soon as the two populations are distinguished, some individuals show signals of admixed origin (Figure 2C; K=7), compatible with the assumption of ongoing gene flow and incomplete reproductive isolation. These observations question the species status of C. balticus, which was based on the assumption of temporal isolation between these two populations (Heimbach, 1978). Third, the data suggest that after the ice age Clunio colonized northern Europe from a single source and expanded along two fronts into the Baltic Sea and into the high Arctic. The mitochondrial haplotype network expands from a single center, which implies a quick radiation from a colonizing haplotype (Figure 2D; note that all sampled populations occur in areas under a former ice shield cover). There is also a large degree of shared nuclear polymorphisms. Thirty-four percent of polymorphic SNPs are polymorphic in all seven populations and 93% are polymorphic in at least two populations (Figure 2—figure supplement 4). Separation of the Baltic Sea populations along PC1 and the Arctic population along PC2 (Figure 2A), suggests that Clunio expanded into the high Arctic and into the Baltic Sea independently. Congruently, nucleotide diversity significantly decreases toward both expansion fronts (Figure 2—figure supplement 5, Supplementary file 1). Postglacial establishment from a common source indicates that the Baltic and Arctic ecotypes have evolved their new local adaptations during this expansion time, that is, in less than 10,000 years. Fourth, ADMIXTURE analysis reveals that sympatric co-existence of the Atlantic and Baltic ecotypes in Bergen likely results from introgression of Baltic ecotype individuals into an existing Atlantic ecotype population. At K=2 and K=3 the two Baltic Sea populations Seh-2AR and Ar-2AR are separated from all other populations and the two Bergen populations Ber-2AR and Ber-1SL show a marked genetic contribution coming from these Baltic Sea populations (Figure 2C). The Baltic genetic component is slightly larger for the Baltic ecotype Ber-2AR population than for the Atlantic ecotype Ber-1SL population. A statistical analysis via TreeMix detects predominantly an introgression from Seh-2AR into Ber-2AR (Figure 2E, Figure 2—figure supplements 6 and 7), suggesting that haplotypes that convey the Baltic ecotype have introgressed into a population of Atlantic ecotype, resulting in the now observed sympatric co-existence of Baltic and Atlantic ecotypes in Bergen. So far Bergen is the only site for which sympatric co-existence of the ecotypes is known. Further work will be required to see if there is a larger area of overlap, for example in the Skagerrak, and if we see similar signals of introgression or possibly secondary contact in other sympatric populations. While our observation adds to the evidence for a role of introgression in adaptation and early speciation processes (Edelman and Mallet, 2021; Baack and Rieseberg, 2007), it also constitutes an excellent starting point for mapping loci that could be involved in the respective ecotype adaptations. Under the assumption that fast adaptation and ecotype formation are mostly based on frequency changes in standing variation, one expects that the respective haplotypes should be shared between ecotypes, especially when transferred via introgression. The following analysis is focussed on the Atlantic and Baltic ecotypes, represented by three populations each. Nuclear haplotype sharing First, we reconstructed the genealogical relationship between 36 individuals (six from each population) in 50 kb windows (n=1,607) along the genome, followed by topology weighting. There are 105 possible unrooted tree topologies for six populations, and 46,656 possibilities to pick one individual from each population out of the set of 36. For each window along the genome, we assessed the relative support of each of the 105 population tree topologies by all 46,656 combinations of six individuals. We found that tree topologies change rapidly along the chromosomes (Figure 3A; Figure 3—figure supplement 1; Supplementary file 2). The tree topology obtained for the entire genome (Figure 3—figure supplement 2) dominates only in few genomic windows (Figure 3A, black bars ‘Orig.’), while usually one or several other topologies account for more than 75% of the tree topologies (Figure 3A, gray bars ‘Misc.’). Hardly ever do all combinations of six individuals follow a single population tree topology (Figure 3A, stars), which implies that in most genomic windows some individuals do not group with their population. Taken together, this indicates a massive sharing of haplotypes across populations. Figure 3 with 12 supplements see all Download asset Open asset Genome screens for haplotype sharing and genotype-ecotype associations. (A) Topology weighting of phylogenetic trees for 36 individuals from the Baltic and Atlantic ecotypes, as obtained from 50 kb non-overlapping genomic windows. Windows were marked by a bar if they were dominated by one kind of topology (wτ>75%). Most windows are not dominated by the consensus population topology (‘Orig.’; Figure 3—figure supplement 2), but by combinations of other topologies (‘Misc.’). Windows dominated by topologies that separate the Baltic and Atlantic ecotypes (‘Ecol.’) are mostly on chromosome 1. Windows consistent with introgression are found all over the genome (‘Intr.’). (B) Distribution of outlier variants (SNPs and Indels) between the six Baltic and Atlantic ecotype populations, after global correction for population structure (XtX statistic). Values below the significance threshold (as obtained by subsampling) are plotted in gray. (C) Association of variant frequencies with Baltic vs. Atlantic ecotype (eBPmc). Values below the threshold of 3 (corresponding to p=10–3) are given in gray, values above 10 are given in red. (D) Genetic differentiation (FST) between the sympatric ecotypes in Bergen. Values above 0.5 are given in black, values above 0.75 in red. (E) The distribution of SNPs with FST ≥ 0.75 in the Baltic vs. Atlantic ecotypes. Circled numbers mark the location of the eight most differentiated loci (see Figure 6). Centromeres of the chromosomes are marked by a red ‘C’. In order to test whether this is only incomplete lineage sorting or if there is additional introgression beyond the detected case in the Bergen populations, we calculated the f4 ratio test for the geographically distant Por-1SL and He-1SL, as well as Ar-2AR and Seh-2AR populations. The genome-wide f4 ratio was –0.00062 (standard error 0.00008) and by comparison to coalescent simulations was found to be significantly different from 0, indicating some degree of introgression. However, the signature of introgression was driven by only 775 SNPs, which is only 0.17% of the total dataset. Thus, the largest part of haplotype sharing can be explained by incomplete lineage sorting, typical for such radiation situations. Finally, we highlighted genomic windows that are consistent with the one clearly detected introgression event from the Baltic ecotype into both Bergen populations (Figure 3A, yellow bars ‘Intr.’; all topologies grouping Por-1SL and He-1SL vs Ber-1SL, Ber-2AR, Seh-2AR, and Ar-2AR; n=357). Regions consistent with this introgression event are scattered over the entire genome. Crosses and QTL mapping suggest that lunar rhythmicity is affected by more than one locus (Fig. 4) As a first attempt to get hold of ecologically relevant loci, we assessed the genetic basis to lunar rhythmicity by crossing the lunar-arrhythmic Ber-2AR strain with the lunar-rhythmic Ber-1SL strain. Interestingly, the degree of lunar rhythmicity segregates within and between crossing families (Figure 4—figure supplement 1), suggesting a heterogeneous multi-genic basis of lunar arrhythmicity. Genetic segregation in the cross implies that the weak rhythm observed in Ber-2AR is due to genetic polymorphism. The Ber-2AR strain seems to carry some lunar-rhythmic alleles, likely due to gene flow from the sympatric Atlantic ecotype (see Figure 1C and results below). For QTL mapping, we then picked a set of F2 families, which all go back to a single parental pair (BsxBa-F2-34; n=272; Figure 4—figure supplement 1) and designed a set of 23 SNP markers for QTL mapping (Figure 4—figure supplement 2). As a phenotype, we scored the degree of lunar rhythmicity for each individual as the number of days between the individual’s emergence and day 9 of the artificial turbulence cycle, which is the peak emergence day of the lunar-rhythmic strain (Figure 4—figure supplement 3). However, QTL analysis with various algorithms did not reveal any significant QTL for lunar rhythmicity (Figure 4). This absence of evidence may partially be due to the limited number of markers and individuals, as well as a specific shortcoming in the phenotyping. Arrhythmic individuals, which by chance emerge during the peak, are recorded as rhythmic. This leads to an inevitable phenotyping error for a fraction of individuals. However, in another study, we could show that this phenotyping error does not impede the detection of an oligogenic genetic architecture in QTL mapping for a comparable arrhythmic phenotype (Briševac et al., 2022). As a control that in principle we had sufficient power for QTL mapping in this crossing family, we also mapped the sex determining locus, which was clearly detectable (Figure 4). We can conclude that lunar rhythmicity is not controlled by a single large-effect locus, but rather has an oligogenic or possibly even polygenic basis. Figure 4 with 3 supplements see all Download asset Open asset Quantitative Trait Locus (QTL) mapping for lunar rhythmicity (blue) and the sex determining locus (red). The sex-determining locus is clearly detectable (red). No significant QTLs can be detected for lunar rhythmicity, suggesting it is controlled by more than one locus. Vertical black lines are genetic markers. Genomic regions associated with ecotype formation Given that QTL mapping for lunar rhythmicity suggested a multi-genic basis to ecotype formation, we next applied three approaches to identify genomic regions associated with divergence between Atlantic and Baltic ecotypes. First, genomic windows which are dominated by tree topologies that group populations according to ecotype were highlighted (Figure 3A, red bars ‘Ecol.’). Second, we screened all genetic variants (SNPs and indels; n=948,128) for those that are overly differentiated between the six populations after correcting for the neutral covariance structure across population allele frequencies (see Ω matrix, Figure 3—figure supplement 3A–B). Such variants can indicate local adaptation. At the same time, we tested for association of these variants with ecotype, as implemented in BayPass Gautier, 2015. Overly differentiated variants (XtX statistic; Figure 3B) and ecotype-associated variants ecotype (eBPmc; Figure 3C) were detected all over the genome, but many were concentrated in the middle of the telocentric chromosome 1. Tests for association of variants with environmental variables such as sea surface temperature or salinity find fewer associated SNPs and no concentration on chromosome 1 (Figure 3—figure supplement 3D–E), confirming that the detected signals are not due to general genome properties, but are specific to the ecotypes. Third, we expected that gene flow between the sympatric Ber-1SL and Ber-2AR populations would largely homogenize their genomes except for regions involved in ecological adaptation, which would be highlighted as peaks of genetic differentiation. While genetic differentiation may not only be due to adaptation, but also due to demographic processes, the ecologically relevant loci should be among the differentiated loci, especially if these loci coincide with peaks in ecotype-association (eBPmc). The distributions of FST values in all pairwise population comparisons confirmed that genetic differentiation was particularly low in the Ber-1SL vs. Ber-2AR comparison (Figure 3—figure supplements 4 and 5). Pairwise differentiation between Ber-1SL and Ber-2AR (Figure 3D) shows marked peaks on chromosome 1, most of which coincide with peaks in XtX and eBPmc. Notably, when assessing genetic differentiation of Baltic vs Atlantic ecotype (72 vs 72 individuals; Figure 3E; Figure 3—figure supplement 6), there is not a single diagnostic variant (FST = 1), and even variants with FST ≥ 0.75 are very rare (n=63; Figure 3E). Genetic divergence (dxy), nucleotide diversity (π) and local linkage disequilibrium (r2) of the two Bergen populations do not show marked differences along or between chromosomes (Figure 3—figure supplement 7). The cluster of ecotype-associated variants on chromosome 1 overlaps with three large blocks of long-range linkage disequilibrium (LD; Figure 3—figure supplement 8). This is also reflected in reduced linkage map length in our crossing mapping family (Figure 4—figure supplement 2). Reduced recombination may in part explain the clustering of divergent and ecotype-associated alleles in the middle of chromosome 1. However, the boundaries of the LD blocks do not correspond to the ecotype-associated region and differ between populations. LD blocks are not ecotype-specific. Local PCA of the strongly ecotype associated region does not reveal patterns consistent with a chromosomal inversion or another segregating structural variant (Figure 3—figure supplement 9). Thus, there is no obvious link between the clustering of ecotype-associated loci and structural variation. Notably, genetic differentiation is not generally elevated in the ecotype-associated cluster on chromosome 1, as would be expected for a segregating structural variant, but drops to baseline levels in between ecotype-associated loci (Figure 3D). Taken together, these observations imply that numerous genomic loci – inside and outside the cluster on chromosome 1 – are associated with ecological adaptation and none of these are differentially fixed between ecotypes. In line with QTL mapping, this virtually excludes the possibility that only one or few loci have driven the new ecotype formation, suggesting instead that ecotype formation is based on a complex polygenic architecture. Support for adaptation from standing genetic variation If, as assumed above, adaptation in these populations was indeed based mostly on standing genetic variation, one should find evidence of this in the data structure. We selected highly ecotype-associated SNPs (XtX >1.152094, threshold obtained from randomized subsampling; eBPmc >3; n=3,976; Figure 3—figure supplement 10A) and assessed to which degree these alleles are shared between the studied populations and other populations across Europe. Allele sharing between the Bergen populations is likely due to ongoing gene flow, and hence Bergen populations were excluded from the analysis. In turn, allele sharing between the geographically isolated Seh-2AR, Ar-2AR, Por-1SL, and He-1SL populations likely represents shared ancient polymorphism. Based on this comparison, we found that 82% of the ecotype-associated SNPs are polymorphic in both Atlantic and Baltic ecotypes, suggesting that the largest part of ecotype-associated alleles originates from standing genetic variation. We then retrieved the same genomic positions from published population resequencing data for Atlantic ecotype populations from Vigo (Spain) and St. Jean-de-Luz (Jean, southern France) Kaiser et al., 2016, an area that is potentially the source of postglacial colonization of all locations in this study. We found that 90% of the alleles associated with the Northern European ecotypes are also segregating in at least one of these southern populations, supporting the notion that adaptation in the North involves a re-assortment of existing standing genetic variation. Ecotypes differ mainly in the circadian clock and nervous system development In the light of a polygenic architecture for ecotype formation, we can expect that usually several genes of relevant physiological pathways are affected. This provides an ideal setting to identify the physiological processes underlying ecotype formation by enrichment analysis. Accordingly, we assessed how all ecotype-associated variants (SNPs and indels; XtX >1.148764; eBPmc >3, n=4,741; Figure 3—figure supplement 10B) relate to C. marinus’ genes. In a first step, we filtered the existing gene models in the CLUMA1.0 reference genome to those that are supported by transcript or protein evidence, have known homologues or PFAM domains, or were manually curated (filtered annotations provided in Supplementary file 3; 15, 193 gene models). Based on this confidence gene set, we then assessed the location of variants relative to genes, as well as the resulting mutational effects (SNPeff; Cingolani et al., 2012; Figure 3—figure supplement 11; statistics in Supplementary file 4). The vast majority of ecotype-specific variants are classified as intergenic modifier variants, suggesting that ecotype formation might primarily rely on regulatory mutations. The ecotype-specific SNPs are found in and around 1,400 genes (Supplementary files 5 and 6). We transferred GO terms from other species to the Clunio confidence annotations based on gene orthology (5,393 genes; see M

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call