Abstract

Complex traits such as obesity are manifestations of intricate interactions of multiple genetic factors. However, such relationships are difficult to identify. Thanks to the recent advance in high-throughput technology, a large amount of data has been collected for various complex traits, including obesity. These data often measure different biological aspects of the traits of interest, including genotypic variations at the DNA level and gene expression alterations at the RNA level. Integration of such heterogeneous data provides promising opportunities to understand the genetic components and possibly genetic architecture of complex traits. In this paper, we propose a machine learning based method, module-guided Random Forests (mgRF), to integrate genotypic and gene expression data to investigate genetic factors and molecular mechanism underlying complex traits. mgRF is an augmented Random Forests method enhanced by a network analysis for identifying multiple correlated variables of different types. We applied mgRF to genetic markers and gene expression data from a cohort of F2 female mouse intercross. mgRF outperformed several existing methods in our extensive comparison. Our new approach has an improved performance when combining both genotypic and gene expression data compared to using either one of the two types of data alone. The resulting predictive variables identified by mgRF provide information of perturbed pathways that are related to body weight. More importantly, the results uncovered intricate interactions among genetic markers and genes that have been overlooked if only one type of data was examined. Our results shed light on genetic mechanisms of obesity and our approach provides a promising complementary framework to the “genetics of gene expression” analysis for integrating genotypic and gene expression information for analyzing complex traits.

Highlights

  • Most complex traits such as obesity involve a diverse set of genes, intricate interplay among them and subtle interaction between genetic and environment factors

  • One of the first steps toward a systematic understanding of the genetic basis of a complex trait is the identification of causal genetic elements, e.g. genes, genetic markers and/or single nucleotide polymorphisms (SNPs), whose variations are responsible for the traits

  • The first is the curse of dimensionality in selecting a subset of genetic elements related to the traits of interest from a large number of candidates

Read more

Summary

Introduction

Most complex traits such as obesity involve a diverse set of genes, intricate interplay among them and subtle interaction between genetic and environment factors. One of the first steps toward a systematic understanding of the genetic basis of a complex trait is the identification of causal genetic elements, e.g. genes, genetic markers and/or single nucleotide polymorphisms (SNPs), whose variations are responsible for the traits. The objective of this challenging task is two-fold: effectively identifying a subset of genetic elements out of a large pool of candidates whose patterns are characteristic of a trait of interest, and accurately predicting the phenotype with a model that accommodate interactions among selected genetic elements. Based on the elastic net regularized regression [16], Chen et al [17] developed Camelot to predict quantitative response

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call