We present an application of computational inverse design, which reverses the conventional trial-and-error forward design paradigm, optimizes biological phenotype by directly modifying genotype. The limitations of inverse design in genotype-to-bulk phenotype (G-BP) mapping can be addressed via an established design paradigm: "design, build, test, learn" (DBTL), where computational inverse design automates both the design and learn phases. In any context, inverse design is limited by the fundamental "one-to-many" nature of the inverse function. G-BP inverse design is further limited by the number of single nucleotide polymorphisms that can be made to a member of the population while maintaining feasibility of genotype creation and biological viability. Considering these limitations, we propose a design paradigm based on incremental optimization of phenotype through a combined computational and experimental approach. We intend this work to be a foundational synthesis of well-known techniques applied to the context of genotype-to-bulk phenotype inverse design, which has not yet been performed in the literature. The design pipeline can optimize phenotype by either directly proposing genotypic changes, or simply by suggesting parents to be used for selective breeding. The soybean nested association matrix data set is used to present an in silico case study of the design pipeline by performing optimization that maximizes protein content while constraining other phenotypes. A random forest (RF) is used to model the genotype-to-phenotype relationship, and a genetic algorithm is used to query the RF until a feasible genotype with desired phenotype is discovered. After 20 in silico DBTL cycles, a final population of individuals with a mean protein content of 36.13%, an increase of three standard deviations above the original mean is suggested.
Read full abstract