A statistical-physics approach for codon usage optimisation

David Luna-Cerralbo,Irene Blasco-Machín,Susana Adame-Pérez,Verónica Lampaya,Ana Larraga,Teresa Alejo,Juan Martínez-Oliván,Esther Broset,Pierpaolo Bruscolini

doi:10.1016/j.csbj.2024.07.020

Abstract

The concept of “codon optimisation” involves adjusting the coding sequence of a target protein to account for the inherent codon preferences of a host species and maximise protein expression in that species. However, there is still a lack of consensus on the most effective approach to achieve optimal results. Existing methods typically depend on heuristic combinations of different variables, leaving the user with the final choice of the sequence hit. In this study, we propose a new statistical-physics model for codon optimisation. This model, called the Nearest-Neighbour interaction (NN) model, links the probability of any given codon sequence to the “interactions” between neighbouring codons. We used the model to design codon sequences for different proteins of interest, and we compared our sequences with the predictions of some commercial tools. In order to assess the importance of the pair interactions, we additionally compared the NN model with a simpler method (Ind) that disregards interactions. It was observed that the NN method yielded similar Codon Adaptation Index (CAI) values to those obtained by other commercial algorithms, despite the fact that CAI was not explicitly considered in the algorithm. By utilising both the NN and Ind methods to optimise the reporter protein luciferase, and then analysing the translation performance in human cell lines and in a mouse model, we found that the NN approach yielded the highest protein expression in vivo. Consequently, we propose that the NN model may prove advantageous in biotechnological applications, such as heterologous protein expression or mRNA-based therapies.

Full Text