The integration of multiple genes to maximize protein expression levels represents an important challenge in synthetic biology. This task relies on the definition of multiple protein-coding sequences, which must be as different as possible to avoid information loss. Proteins can be encoded in different ways, using synonymous codons that translate into the same amino acid. Some codons are better suited to the host than others, thus being preferable the use of the most fitting ones. However, adopting only the most highly adapted codons would lead to very similar coding sequences. An additional criterion is given by the fact that the designed sequences must contain a suitable guanine–cytosine (GC) ratio in accordance with the characteristics of the host organism. Therefore, this biological task requires the simultaneous optimization of several, conflicting objectives. This work proposes a novel multi-objective approach for protein encoding, which tackles the problem according to a new formulation based on three objective functions: codon adaptation index, Hamming distance between sequences, and GC content. Our work extends the recent Butterfly Optimization Algorithm to multi-objective contexts, integrating problem-specific operators to boost solution quality by covering the different aspects required for accurate protein encoding. Two key structures, a taboo list and a best solution list, are defined to conduct improved searches attending to the potential improvements that each solution in the population can promote. Experiments conducted on nine real-world proteins reveal the attainment of relevant solutions from different evaluation perspectives, showing significant improvements over other single and multi-objective methods from the literature.
Read full abstract