Abstract Genome-wide association studies (GWAS) have been widely used to reveal the associations between genetic variations and phenotypes in a population of individuals. However, they have been criticized for missing important genetic markers usually due to the fact that the data may not fit the statistical models well. In this study, we address the challenge of identifying significant single nucleotide polymorphisms (SNPs) in GWAS by harnessing the capabilities of two sophisticated regression models, BIGLASSO and AUTALASSO. They are both variants of the least absolute shrinkage and selection operator (LASSO). Our research contributes to the field of genomics through detailed comparative analysis of Arabidopsis thaliana, revealing how each method specializes in uncovering SNPs for different trait types. Our findings indicate that BIGLASSO shows stronger alignment with GWAS results, particularly excelling in the analysis of binary traits, even when these are derived from categorical phenotypes. AUTALASSO could be effective for quantitative traits and complement GWAS. We demonstrate that these LASSO-based methods can significantly enhance the identification of genetic markers, offering a potent complement to traditional GWAS approaches. Our findings not only bridge the gap between statistical and machine learning methodologies in genetic studies but also provide a practical framework for researchers seeking to validate reported SNPs or explore new genomic regions for trait association. This work stands as a pivotal step towards the integration of advanced computational techniques in genomics, paving the way for more precise and comprehensive genetic analyses.
Read full abstract