The integration of genotypic and environmental data can enhance the prediction accuracy of field traits of crops. The existing genomic prediction methods fail to consider the environmental factors and do not consider the real growing environment of crops, resulting in low genomic prediction accuracy. In this work, we propose a genotype-environment interaction genomic prediction method in maize, called GEFormer, based on integrating the gating mechanism MLP and linear attention mechanism. Firstly, it uses gated multilayer perceptron (gMLP) to extract the local and global features among SNPs. Then, the Omni-dimensional Dynamic Convolution is used to extract the dynamic and comprehensive features of multiple environmental factors within each day in the consideration of the real growth pattern of crops. The linear attention mechanism is used to capture the temporal features of environmental changes. Finally, it uses the gating mechanism to fuse the genomic and environmental features effectively. We validate the accuracy of GEFormer in predicting important agronomic traits of maize, rice and wheat in three experimental scenarios: untested genotypes in tested environments, tested genotypes in untested environments, untested genotypes in untested environments. Experimental results show that GEFormer outperforms six cutting-edge statistical learning methods and four machine learning methods. Furthermore, it shows great advantages in the experimental scenario of untested genotypes in untested environments. In addition, we used GEFormer into three real-world breeding applications: phenotype prediction in unknown environments, hybrid phenotype prediction using inbred population, and cross-population phenotype prediction. The results illustrate that GEFormer exhibiting better prediction performance in actual breeding scenarios, and it can be utilized to assist crop breeding.
Read full abstract