Spinach (Spinacia oleracea) is an important leafy crop possessing notable economic value and health benefits. Current genomic resources include reference genomes and genome-wide association studies. However, the worldwide genetic relationships and the migration history of the crop remained uncertain, and genome-wide association studies have produced extensive gene lists related to agronomic traits. Here, we re-analysed the sequenced genomes of 305 cultivated and wild spinach accessions to unveil the phylogeny and history of cultivated spinach and to explore genetic variation in relation to phenotypes. In contrast to previous studies, we employed machine learning methods (based on Extreme Gradient Boosting, XGBoost) to detect variants that are collectively associated with agronomic traits. Variant-based cluster analyses revealed three primary spinach groups in the Middle East, Asiaand Europe/US. Combining admixture analysis and allele-sharing statistics, migration routes of spinach from the Middle East to Europe and Asia are presented. Using XGBoost machine learning models we predict genomic variants influencing bolting time, flowering time, petiole color, and leaf surface texture and propose candidate genes for each trait. This study enhances our understanding of the history and phylogeny of domesticated spinach and provides valuable information on candidate genes for future genetic improvement of the crop.
Read full abstract