In directed evolution (DE) the assessment of candidate enzymes and their modification is essential. In this study we have investigated genetic algorithms (GAs) in this context and conducted a systematic study of the behavior of GAs on 20 fitness landscapes (FLs) of varying complexity. This has allowed the tuning of the GAs to be explored. On the basis of this study, recommendations for the best GA settings to use for a GA-directed high-throughput experimental program (in which populations and the number of generations is necessarily low) are reported. The FLs were based upon simple linear models and were characterized by the behavior of the GA on the landscape as demonstrated by stall plots and the footprints and adhesion of candidate solutions, which highlighted local optima (LOs). In order to maximize progress of the GA and to reduce the chances of becoming stuck in a LO it was best to use: 1) a large number of generations, 2) high populations, 3) removal of duplicate sequences (clones), 4) double mutation, and 5) high selection pressure (the two best individuals go to the next generation), and 6) to consider using a designed sequence as the starting point of the GA run. We believe that these recommendations might be appropriate starting points for studies employing GAs within DE experiments.
Read full abstract