Progress in rapid genomic sequencing techniques have transformed the field of disease biomarker identification by offering vast genetic information. The complexity of traits is not only influenced by single genetic loci but also by interactions among multiple genetic loci. When the dimensionality of SNP data is large, identifying a significant number of genetic variants associated with diseases becomes extremely challenging. To address these high-dimensionality issues, we employed functional data analysis techniques. Because there are a lot of ordered genetic variants spread out across a small space, multiple gene variations are handled as a continuous data set rather than discrete variables in some areas. This paper introduces a novel approach for analyzing the association of multiple genes within a region, by employing an integrative functional logistic regression model. The proposed technique has shown promising results in both simulation and real data analysis, indicating its ability to generate smooth signals and accurately estimate the coefficients of the function while recognizing the null regions. Integrative functional logistic regression method adopt functional data analysis and assume that high-dimensional genetic data follow a continuous process. It not only naturally accommodates correlations among adjacent SNPs but also avoids the unstable estimation of a large number of parameters. This is especially desirable with the rapidly increasing dimensions of SNP data but still limited sample size. In summary, the suggested approach offers a valuable new avenue for identifying disease-related genetic variants in GWAS.
Read full abstract