ICOR: improving codon optimization with recurrent neural networks

Rishab Jain,Douglas Densmore,Aditya Jain,Kevin Leshane,Elizabeth Mauro

doi:10.1186/s12859-023-05246-8

Rishab Jain, Douglas Densmore + Show 3 more

Open Access

https://doi.org/10.1186/s12859-023-05246-8

Copy DOI

Abstract

BackgroundIn protein sequences—as there are 61 sense codons but only 20 standard amino acids—most amino acids are encoded by more than one codon. Although such synonymous codons do not alter the encoded amino acid sequence, their selection can dramatically affect the expression of the resulting protein. Codon optimization of synthetic DNA sequences is important for heterologous expression. However, existing solutions are primarily based on choosing high-frequency codons only, neglecting the important effects of rare codons. In this paper, we propose a novel recurrent-neural-network based codon optimization tool, ICOR, that aims to learn codon usage bias on a genomic dataset of Escherichia coli. We compile a dataset of over 7,000 non-redundant, high-expression, robust genes which are used for deep learning. The model uses a bidirectional long short-term memory-based architecture, allowing for the sequential context of codon usage in genes to be learned. Our tool can predict synonymous codons for synthetic genes toward optimal expression in Escherichia coli.ResultsWe demonstrate that sequential context achieved via RNN may yield codon selection that is more similar to the host genome. Based on computational metrics that predict protein expression, ICOR theoretically optimizes protein expression more than frequency-based approaches. ICOR is evaluated on 1,481 Escherichia coli genes as well as a benchmark set of 40 select DNA sequences whose heterologous expression has been previously characterized. ICOR’s performance is measured across five metrics: the Codon Adaptation Index, GC-content, negative repeat elements, negative cis-regulatory elements, and codon frequency distribution.ConclusionsThe results, based on in silico metrics, indicate that ICOR codon optimization is theoretically more effective in enhancing recombinant expression of proteins over other established codon optimization techniques. Our tool is provided as an open-source software package that includes the benchmark set of sequences used in this study.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Apr 4, 2023
Citations: 11	License type: open-access

R Discovery Prime

R Discovery Prime

ICOR: improving codon optimization with recurrent neural networks

Abstract

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

Author response: Distinct responses to rare codons in select Drosophila tissues
Scott R Allen ... Donald T Fox
-
Scott R Allen, et. al.Scott R Allen ... Donald T Fox
03 May 2022
03 May 2022

Genetic Code-guided Protein Synthesis and Folding in Escherichia coli
Shaoliang Hu ... Mingyue He
Journal of Biological Chemistry | VOL. 288
Shaoliang Hu, et. al.Shaoliang Hu ... Mingyue He
01 Oct 2013
Journal of Biological Chemistry | VOL. 288

Determinants of translation efficiency and accuracy
Hila Gingold ... Yitzhak Pilpel
Molecular Systems Biology | VOL. 7
Hila Gingold, et. al.Hila Gingold ... Yitzhak Pilpel
01 Jan 2010
Molecular Systems Biology | VOL. 7

Codon usage of HIV regulatory genes is not determined by nucleotide composition.
Supinya Phakaratsakul ... Chompunuch Boonarkart
Archives of virology | VOL. 163
Supinya Phakaratsakul, et. al.Supinya Phakaratsakul ... Chompunuch Boonarkart
25 Oct 2017
Archives of virology | VOL. 163

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

ICOR: improving codon optimization with recurrent neural networks

Abstract

Talk to us

Similar Papers

More From: BMC Bioinformatics