Abstract

Protein stability is affected at different hierarchies – gene, RNA, amino acid sequence and structure. Gene is the first level which contributes via varying codon compositions. Codon selectivity of an organism differs with normal and extremophilic milieu. The present work attempts at detailing the codon usage pattern of six extremophilic classes and their harmony. Homologous gene datasets of thermophile-mesophile, psychrophile-mesophile, thermophile-psychrophile, acidophile-alkaliphile, halophile-nonhalophile and barophile-nonbarophile were analysed for filtering statistically significant attributes. Relative abundance analysis, 1–9 scale ranking, nucleotide compositions, attribute weighting and machine learning algorithms were employed to arrive at findings. AGG in thermophiles and barophiles, CAA in mesophiles and psychrophiles, TGG in acidophiles, GAG in alkaliphiles and GAC in halophiles had highest preference. Preference of GC-rich and G/C-ending codons were observed in halophiles and barophiles whereas, a decreasing trend was reflected in psychrophiles and alkaliphiles. GC-rich codons were found to decrease and G/C-ending codons increased in thermophiles whereas, acidophiles showed equal contents of GC-rich and G/C-ending codons. Codon usage patterns exhibited harmony among different extremophiles and has been detailed. However, the codon attribute preferences and their selectivity of extremophiles varied in comparison to non-extremophiles. The finding can be instrumental in codon optimization application for heterologous expression of extremophilic proteins.

Highlights

  • The genetic codes are coding units for translation of nucleic acid into protein sequences

  • The present study commenced with the data collection of coding DNA sequences (CDS) of homologous extremophilic and non-extremophilic proteins

  • CLUSS2, a non-alignment based method measuring Substitution Matching Similarity was chosen[16]. This led to selection of homologous extremophilic and non-extremophilic pairs constituting six dataset (T-M, thermophiles-mesophiles dataset; P-M, psychrophiles-mesophiles dataset; T-P, thermophiles-psychrophiles dataset; B-Nb, barophiles-nonbarophiles dataset; H-Nh, halophiles-nonhalophiles dataset; and A-B, acidophiles-alkaliphiles dataset)

Read more

Summary

Introduction

The genetic codes are coding units for translation of nucleic acid into protein sequences. Extremophiles have developed molecular mechanisms for physicochemical adaptations towards their extreme milieu at multiple levels. Each level comprises of numerous attributes which requires further exploration[5,6] It has been done usually through comparing their genomic features, sequence and order of genes, codon usage pattern, gene regulation and expression. Zeldovich et al (2007) revealed that the codon usage pattern creates a direct link between principles of protein stability and evolutionary mechanisms of extremophilic adaptation[11]. CDS of those proteins having extreme optimum pH were collected (Acid stable, pH ≤ 6 and Alkaline stable, pH ≥ 8); CLUSS 2. To further elucidate the codon usage patterns, various approaches were employed to generate prediction models for classification of extremophilic CDS from their normal counterparts

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.