Abstract

Sesame (Sesamum indicum) is an old and important oilseed crop that provides humans with high-quality oils and numerous health-promoting compounds. The low per-unit yield of sesame, coupled with the increasing demand for seeds and derivatives, drives breeding efforts to develop high-yielding sesame varieties with improved quality. The main objective of genetics studies on crops is to uncover how specific variants and genes underlie agronomic trait variation for advanced molecular breeding applications. Developing high-quality genomic resources is a prerequisite for understanding crops' genomes and enhancing crop improvement and breeding to meet various societal needs (Michael and VanBuren, 2020). Genomic variation is widespread and is the most important factor leading to genomic differences, which can be divided into SNPs (=1 bp), InDels (small insertions/deletions, ≤50 bp) and structural variations (>50 bp). Apart from SNPs and InDels, SVs exist widely in plant genomes and affect various agronomic traits (Liu et al., 2020). SVs can cause wide-ranging perturbations of cis-regulatory regions affecting the genetic determination of agronomic traits. Hence, identifying SVs is essential for the development of new tools to accelerate sesame genomics research and genetic improvement. To provide high-quality chromosome-scale assemblies of sesame genomes, five varieties, including one modern cultivar Zhongzhi13, and four landraces (Dongyangmi, EC34, Qiongzhongbai, and Zhima8131) were de novo sequenced (Table S1, Figure S1). In total, 41.13–72.77 Gb (~133–242X) long-read sequencing data (Table S2) and 33.2 Gb (~108X) Hi-C data were generated and subsequently assembled. The final assembled genome sizes and contig N50 of five sesame varieties ranged from 304 to 321 Mb and from 4.79 to 6.83 Mb, respectively, with scaffolds N50 of 20.49–23.42 Mb (Table S3). The Benchmarking Universal Single-Copy Orthologs (BUSCO) analyses revealed a completeness rate of over 97.5% for the five assembled genomes (Table S4). Annotation analysis revealed that 48.13%–49.14% of the genome was repeat sequences (Figure S2, Table S5), and a total of 28 410–28 922 high-quality protein-coding genes were predicted (Table S6). A maximum of 92.28% (91.25% on average) of the annotated genes are homologous to known proteins in other species (Table S7). Through comparison with the Zhongzhi13 genome, we identified a total of 524 878–676 611 SNPs, 288 021–385 611 InDels, and 8499–10 678 SVs in the other four sesame varieties (Table S8). SVs affected approximately 5215 genes, including 1061 severely affected genes (Table S8). The most affected genes include SiCEN2, a Centroradialis-like gene, which is homologous to TERMINAL FLOWER 1 (TFL1) in Arabidopsis thaliana and functions to maintain plant vegetative growth and inflorescence meristem features (Zhu and Wagner, 2020). In addition, we identified 177 SV hotspot regions that were unevenly distributed across 13 chromosomes (Table S9). A total of 2183 genes were affected by these hotspot regions, 236 of which were gravely affected. Enrichment analyses assigned these 236 genes mainly related to lipid metabolism and response to stress (Figure S3). These results indicated the importance of SVs for molecular insights into sesame biology and quality. SVs contribute significantly to genetic diversity and underlie natural variation of various agronomic traits (Liu et al., 2020). To investigate the impact of SVs on the genetics and yield of the sesame population, 213 cultivated sesame accessions and one wild sesame accession (DSFP) from 20 countries were collected and resequenced (Tables S10 and S11). Through integrated analysis, we identified 1 320 760 SNPs and 14 013 SVs for structure analysis and genome-wide association study (GWAS). Most SVs were of small size (Figure S4a), with over 50% of deletions and duplications, respectively, dominated by DNA transposable elements (Figure S4b). Population structure analysis using SNP and SV datasets resulted in highly consistent results (Figure 1a). The tested sesame accessions could be categorized into three groups based on their latitude (Figure S5). These results were further supported by SNP- and SV-based principal-component analyses (Figure 1b,c). To unveil genes under positive selection in modern cultivars, we focussed on the 47 modern Chinese sesame varieties (MCSV) produced in the past decades (Table S10). Compared with landraces, MCSVs had a longer maturity period, higher plant height, yield per plant, sesamin and oil content (Figure S6). Selection sweep analysis based on SNPs and SVs identified 865 and 892 selected genes in MCSV, respectively (Figures S7 and S8). Only 208 of these genes were commonly identified through SNP- and SV-based analyses (Figure S9). The SNP-based selected genes were mainly assigned to the response to oxidative stress and peroxidase activity (Figure S10), while those from SV-based were mostly enriched in phenylpropanoid biosynthesis, flavonoid biosynthesis and related pathways of oil metabolism (Figure S11). It was noteworthily 18 of these genes were associated with phenylpropanoid biosynthesis, indicating the importance of this pathway for sesame improvement. Sesame yield is controlled by multiple factors, among which plant architecture (PA, branched/single-stemmed) and capsule number per leaf axil (CNPLA, one/three) are the most important determinants (Figure 1d,e). Agro-morphological evaluations in different environments revealed that PA and CNPLA were strongly correlated (r = 0.61). SNP-based and SV-based GWAS identified two significant pleiotropic loci (peak SNP at Chr7:5944520 and peak SV at Chr11:13894919) for PA and CNPLA (Figure 1f–i, Table S12). The peak SNP on Chr7 is a nonsynonymous mutation in the exon of the gene SiACS9 (Table S13). SiACS9 is homologous to AtACS9, the catalysing enzyme of the rate-limiting step in the ethylene biosynthetic pathway (Tsuchisaka and Theologis, 2004). The peak SV on Chr11 is precisely located in SiCEN2 and consists of a 465-bp absence variant (a Copia type of LTR) in the mutated sequence (Table S13). This resulted in sequence alterations of the last two exons, resulting in amino acid deletions and frameshifts (Figure 1j). Interestingly, these results corroborate the selective sweep analyses (Figures S12 and S13). Furthermore, the peak SV on Chr11 is co-localized with previously detected QTL regions for the same traits (Mei et al., 2017; Figure 1k). We then sequenced two parents of the mapping population and found that Yuzhi4 (single-stemmed parent with three CPLA) contained this 465-bp LTR insertion, while BS (branched parent with one CPLA) did not (Figure 1l; Table S14). The nucleotide diversity of single-stemmed and 3CsPLA was lower than that of its corresponding type, suggesting that these two phenotypes were under selection pressure (Figure S13). Most of the sesame varieties with reference sequence are single-stemmed with three CPLA (3CsPLA), while those with the alternative sequence are branched sesame with one CPLA (1CPLA; Figure S14). SiCEN2 nucleotide diversity analysis shows that it has been positively selected during breeding (Figure 1m). Supportively, 41.12% of landraces are branched sesame with 1CPLA, while 76.6% of MCSV are single-stemmed sesame with 3CsPLA (Figure S15). The yield per plant and plot of 3CsPLA and single-stemmed sesame were significantly higher than that of 1CPLA and branched sesame, respectively (Figure S16). Identical results were recorded when the two traits were combined in a single plant (Figure S17). More importantly, the yield per plant and plot of sesame with reference SiCEN2 alleles were significantly higher than those with alternative alleles (Figure 1n). These results corroborate the requirements for sesame breeding aiming to increase the yield by improving the density of plants per surface. This study presented five high-quality chromosome-level reference genomes and identified SVs for the sesame natural population. The natural 465-bp LTR insertion in SiCEN2 might be involved in single-stemmed and 3CsPLA formation, ultimately increasing sesame yield. Our findings offer valuable genomic resources and an efficient approach for accelerating sesame genomics, evolution and marker-trait association studies. This research was funded by the Agricultural Science and Technology Innovation Project of the Chinese Academy of Agricultural Sciences (CAAS-ASTIP-2021-OCRI) and the Earmarked Fund for China Agriculture Research System (CARS-14). The authors declare no conflicts of interest. S.S. analysed the data and wrote the manuscript. L.W. and J.Y. conceived this project. Other authors carried out analyses and experiments. All authors have read and approved the final version of the manuscript. The sequencing data are available at China National Center for Bioinformation (GSA submit numbers: CRA007828 and CRA006452). Appendix S1 Supplementary methods, figures, and tables. Please note: The publisher is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call