Maize is known for its phenotypic and genetic diversity. On average, two maize lines diverge from one another as much as humans do from chimpanzees (Buckler et al., 2006). This diversity is attributed to pollen flow between domesticated maize and its wild relative teosinte, as well as to trading between farmers (Hake & Ross-Ibarra, 2015). Maize diversity contributes to its adaptability to new climates and growth conditions so that, currently, maize is grown across a wider area than any other crop (Hake & Ross-Ibarra, 2015). Analysing maize populations based on DNA sequence polymorphisms (markers) is the premise for identifying selection targets and understanding geographic relations. Many studies (Brandenburg et al., 2017; Chia et al., 2012) have revealed changes in genetic diversity by re-sequencing maize lines with differing sequencing depth, and with samples from across America and Europe. However, comparing markers between datasets of different studies can be challenging: datasets can differ in allele frequencies, in the single nucleotide polymorphism (SNP)-calling pipelines, or in stochastic distributions of read depths, and therefore might identify different markers even in the same genomic region. Therefore, Marcin Grzybowski, James Schnable and team set out to unify previous datasets as well as incorporate newly re-sequenced samples in one dataset (Grzybowski et al., 2023). Schnable obtained his PhD from the University of California–Berkeley, working on plant comparative genomics. During doing lab rotations, he got hooked on computational biology because this approach leads to much faster results than wet-lab experiments. After a short postdoc at the Chinese Academy of Agricultural Sciences in Beijing, he was hired as a Professor at the University of Nebraska–Lincoln to work on computational biology. He soon got involved in a new plant phenotyping platform being started there, and brought together teams of biologists, computer scientists and statisticians to work with the platform. “It's been a lot of fun”, he says! For the diversity dataset, Grzybowski and colleagues used whole genome resequencing data from 1515 maize samples, comprising lines from the Wisconsin Diversity panel (Hansey et al., 2011), inbred lines from Poland, as well as wild relatives, tropical landraces, archaeological samples and modern open-pollinated varieties. Overall, these samples originated in or were developed over six continents (Figure 1a). The sequence data were aligned to the maize reference genome, and over 350 million potential DNA sequence polymorphisms were identified. The dataset is therefore much bigger than the approximately 83 million variants identified in the maize HapMap3 project, which included over 1200 maize accessions, but used a lower sequencing depth (Bukowski et al., 2018). Second-stage quality filtering of their new dataset reduced the number of variants to approximately 46 million higher confidence variants. Marker dataset for global maize diversity can be used to identify new genes associated with a trait. (a) Geographical distribution of the countries of origin for the 1515 maize individuals used in the study. (b) Association test using the marker set defined in this study identified MADS69 and ZCN8 based on female flowering data (days to silking). Adapted from Grzybowski et al. (2023). Population genetic analyses are not only based on DNA sequence diversity, but also require the analysis of phenotypic traits. However, comparing phenotypes across different environments adds more variance and thereby reduces the statistical power to link genotype and phenotype. Comparing genotypes within the same environment is desirable, but not all genotypes are adapted to the same environments and can complete their life cycle. To tackle this problem, researchers use association panels, which maximize genetic diversity by selecting genotypes adapted to a specific environment. Grzybowski and colleagues used their marker set to analyse the Wisconsin Diversity Panel (Hansey et al., 2011) and found that the lines retained over 70% DNA sequence variation compared to the wild relative Zea mays ssp. parviglumis, suggesting that there is still a lot of genetic variation in this diversity panel. Therefore, the marker dataset can serve as a resource for other researchers to calculate accurate values of DNA sequence diversity for their populations. To analyse the impact of the high marker density on the outcomes for genome-wide association studies (GWAS), Grzybowski and colleagues used a published set of female flowering data generated from temperate-adapted maize inbreds (Mural et al., 2022). A previous GWAS using around 400 000 RNA-sequencing-based markers identified the flowering time gene MADS69 (Mural et al., 2022). With the newly generated marker set, Grzybowski and colleagues identified both MADS69 and a new locus, ZCN8, a gene that contributes (Guo et al., 2018) to maize adaption to temperate climates (Figure 1b). Grzybowski and colleagues speculate that ZCN8 was previously not discovered because the dataset used RNA-sequencing-based genetic markers and therefore missed significant SNPs in the intergenic space. Grzybowski and colleagues are hopeful that the higher density of markers will also help to achieve more precise localization of the causal variants associated with specific GWAS hits. Because of the diversity range of the maize lines used, this dataset can also be used to detect selection patterns in the genome associated with traits of interest; for example, those related to domestication, adaptation to the environment or genetic improvement during breeding. In the case of MADS69, Grzybowski and colleagues found less DNA sequence diversity in the promoter in tropical and temperate maize lines than in the wild relative. For ZCN8, they confirmed previous studies that had shown less DNA sequence diversity in domesticated maize than in teosinte (Guo et al., 2018) and, additionally, they showed that the decline was biggest between temperate and tropical domesticated maize populations, whereas tropical maize diversity was similar to that of teosinte, suggesting that ZCN8 was selected during adaptation to temperate conditions rather than during domestication. Grzybowski, who is currently a senior researcher at the University of Warsaw, hopes to use the dataset to identify genes underlying maize adaptation to cold and understand their evolutionary history, a topic that relates back to his PhD studies on maize adaptations to cold spring conditions. Including global maize diversity in the dataset, instead of focusing on a single GWAS population adapted to a single environment, will make it a lot easier for researchers to track the evolutionary histories and impact of selection on functional variants once they have been identified in a GWAS study.