This paper explores the genotype-phenotype relationship. It outlines conditions under which the dependence of a quantitative trait on the genome might be predictable, based on measurement of a limited subset of genotypes. It uses the theory of real-valued Boolean functions in a systematic way to translate trait data into the Fourier domain. Important trait features, such as the roughness of the trait landscape or the modularity of a trait have a simple Fourier interpretation. Ruggedness at a gene location corresponds to high sensitivity to mutation, while a modular organization of gene activity reduces such sensitivity.Traits where rugged loci are rare will naturally compress gene data in the Fourier domain, leading to a sparse representation of trait data, concentrated in identifiable, low-level coefficients. This Fourier representation of a trait organizes epistasis in a form which is isometric to the trait data. As Fourier matrices are known to be maximally incoherent with the standard basis, this permits employing compressive sensing techniques to work from data sets that are relatively small—sometimes even of polynomial size—compared to the exponentially large sets of possible genomes.This theory provides a theoretical underpinning for systematic use of Boolean function machinery to dissect the dependency of a trait on the genome and environment.
Read full abstract