Abstract

Predicting phenotypes using genome-wide genetic variation and gene expression data is useful in several fields, such as human biology and medicine, as well as in crop and livestock breeding. However, for phenotype prediction using gene expression data for mammals, studies remain scarce, as the available data on gene expression profiling are currently limited. By integrating a few sources of relevant data that are available in mice, this study investigated the accuracy of phenotype prediction for several physiological traits. Gene expression data from two tissues as well as single nucleotide polymorphisms (SNPs) were used. For the studied traits, the variance of the effects of the expression levels was more likely to differ among the genes than were the effects of SNPs. For the glucose concentration, the total cholesterol amount, and the total tidal volume, the accuracy by cross validation tended to be higher when the gene expression data rather than the SNP genotype data were used, and a statistically significant increase in the accuracy was obtained when the gene expression data from the liver were used alone or jointly with the SNP genotype data. For these traits, there were no additional gains in accuracy from using the gene expression data of both the liver and lung compared to that of individual use. The accuracy of prediction using genes that were selected differently was examined; the use of genes with a higher tissue specificity tended to result in an accuracy that was similar to or greater than that associated with the use of all of the available genes for traits such as the glucose concentration and total cholesterol amount. Although relatively few animals were evaluated, the current results suggest that gene expression levels could be used as explanatory variables. However, further studies are essential to confirm our findings using additional animal samples.

Highlights

  • Recent advances in high-throughput technologies have generated large amounts of single nucleotide polymorphism (SNP) data in many species

  • The data that were used were phenotypic values of physiological quantitative traits, genome-wide SNP genotypes, gene expression levels in the liver and lung in heterogeneous stock (HS) mice, and gene expression levels in several organs of C57BL6 mice; the former two types of data were collected from the database of the Wellcome Trust Centre for Human Genetics

  • The deviance information criteria (DIC) [31] that were obtained with the ridge and lasso regressions are shown in Fig. 1 for the cases where the SNP genotype and gene expression level were used as explanatory variables

Read more

Summary

Introduction

Recent advances in high-throughput technologies have generated large amounts of single nucleotide polymorphism (SNP) data in many species. Chen et al [5] used information on gene expression for yeast under drug-free conditions, and Bhattacharjee and Sillanpaa (2011) [7] used expression information from soybeans that were not infected with a pathogen These aspects suggest the valid use of gene expression data in predicting complex traits, it is necessary to further examine whether a positive result obtained in these studies could be applicable to other species, such as mammals [13]. By targeting certain physiological traits in mice and employing different statistical methods for prediction, we assess the profiles of prediction accuracy for phenotypes using gene expression data in two tissues as well as SNP genotype data

Materials and Methods
Statistical methods for prediction
Results and Discussion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call