Introduction: The metabolome is a collection of small molecules in a biologic sample, and may serve as biomarkers or predictors of heart disease. Whole genome sequence analysis offers the opportunity to investigate rare and low-frequency annotated variants across the human genome. We used whole genome sequence analysis to characterize the genetic architecture of the serum metabolome. Methods: Whole genome sequencing and measurement (chromotagraphy and mass spectroscopy) of 245 serum metabolites were done in 1,458 European Americans and 1,679 African Americans from the Atherosclerosis Risk in Communities (ARIC) study, and these data were used to perform a trans-ethnic meta-analysis. Common variants (MAF>5%) were analyzed individually using an additive genetic model. Rare and low-frequency protein-altering variants (MAF≤5%) were aggregated by genes. In order to determine the contribution of regulatory and non-protein coding regions of the genome, we conducted aggregate tests across the entire genome using a 4kb sliding window as well as in predefined regulatory elements, which includs enhancers, promoter, and 3’ and 5’ untranslated region of a gene. Results: We identified 119 significant associations between genetic variants and metabolite levels (significance threshold p<2.0*10 -10 for single variants, p<2.9х10 -10 for aggregate tests), of which 49 were novel, including genes involved in known Mendelian conditions, protein biological processes, and disease related pathways. Six genes ( DMGDH, AGA, ACY1, PRODH, DDC and CPS1 ) causing rare inborn errors of metabolism were associated with amino acid levels in the general population. A predicated regulatory variant in the AGA gene, encoding a protein involved in asparagine generation, was associated with serum asparagine levels independent of any coding variants in this gene. Seven genes ( ABCC2, PKD2L1, SLC10A1, FDX1, CYP3A43, UGT2B15 and SULT2A1 ) related to lipid-related metabolite levels were identified, whose gene products are involved in secretion, channeling and transportation. Analysis of regulatory regions unraveled associations between three steroid lipids and a member of the cytochrome P450 family, CYP3A43 . Five genes within the kinin-kallikrein pathway were identified to be related to small peptide levels, including KLKB1 , KNG1 , F12 , ACE and CPN1 . Variants in CPN1 , which is known to bind to fibrinogen, were associated with DSGEGDFXAEGGGVR, a peptide which is produced during fibrinogen to fibrin conversion. Conclusion: This study outlines an approach to characterize the genetic architecture of the human serum metabolome and shows that sequence variants affect multiple human metabolites. Using the principle of Mendelian randomization, the next step is to determine whether any of these metabolites are in causal pathways to disease.
Read full abstract