Abstract

Rapid improvement in technology has made it relatively cheap to collect genetic data, however statistical analysis of existing data is still much cheaper. Thus, secondary analysis of single-nucleotide polymorphism, SNP, data, i.e., reanalysing existing data in an effort to extract more information, is an attractive and cost-effective alternative to collecting new data. We study the relationship between gene expression and SNPs through a combination of factor analysis and dimension reduction estimation. To take advantage of the flexibility in traditional factor models where the latent factors are not required to be normal, we recommend using semiparametric sufficient dimension reduction methods in the joint estimation of the combined model. The resulting estimator is flexible and has superior performance relative to the existing estimator, which relies on additional assumptions on the latent factors. We quantify the asymptotic performance of the proposed parameter estimator and perform inference by assessing the estimation variability and by constructing confidence intervals. The new results enable us to identify, for the first time, statistically significant SNPs concerning gene-SNP relations in lung tissue from genotype-tissue expression data.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.