Abstract
Here we analyse genetic variation, population structure and diversity among 3,010 diverse Asian cultivated rice (Oryza sativa L.) genomes from the 3,000 Rice Genomes Project. Our results are consistent with the five major groups previously recognized, but also suggest several unreported subpopulations that correlate with geographic location. We identified 29 million single nucleotide polymorphisms, 2.4 million small indels and over 90,000 structural variations that contribute to within- and between-population variation. Using pan-genome analyses, we identified more than 10,000 novel full-length protein-coding genes and a high number of presence–absence variations. The complex patterns of introgression observed in domestication genes are consistent with multiple independent rice domestication events. The public availability of data from the 3,000 Rice Genomes Project provides a resource for rice genomics research and breeding.
Highlights
We analyse genetic variation, population structure and diversity among 3,010 diverse Asian cultivated rice (Oryza sativa L.) genomes from the 3,000 Rice Genomes Project
Using pan-genome analyses, we identified more than 10,000 novel full-length protein-coding genes and a high number of presence–absence variations
We report pan-genome analyses for O. sativa, and the high numbers of presence–absence variations (PAVs) highlight another component of within-species diversity for rice
Summary
Presence frequency 0 0.25 0.50 0.75 1 major-group-unbalanced SVs unevenly distributed among XI, GJ, cA and cB on the basis of two-sided Fisher’s exact tests. In all major groups formed candidate core gene families, and the remaining 9,050 (37.9%) comprised distributed gene families (Fig. 4a, b and Supplementary Data 3 Table 3). The O. sativa pan-genome consists of between 12,770 and approximately 14,826 (53.5% to about 62.1%) core gene families, and at least 9,050 (37.9%) distributed gene families: each accession contains between 63.4% and about 73.5% core gene families and at least 26.5% distributed gene families (Fig. 4b). We found 98.4% of the IR 8 and 98.6% of the N 22 genome sequences could be mapped to the pangenome, whereas only 94.3% and 94.0% could be found in Nipponbare RefSeq. By comparing pan-genome data with high-quality XI reference genomes of Zhenshan 97 and Minghui 6330, approximately 25% of the novel genes were shorter owing to gene predictions from fragmented sequences (Extended Data Fig. 5c, d). We identified 4,270 XI and 1,384 GJ subpopulation-unbalanced gene families, showing variation between subpopulations within each major group (Extended Data Fig. 7g). Correlation between gene PAVs and plant height detected the well-known green revolution gene (sd1) as the first-ranked candidate. sd[1] is classified as a distributed gene—caused by an approximately 385-bp deletion— and is significantly (P value < 10−20) associated with greatly reduced plant height; it was absent most frequently in XI-1A and XI-1B varieties (Extended Data Fig. 11)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.