A computational framework for exploring structural protein variability in virus variants using a codon network model

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

This study applies a graph-theoretic framework to analyze the structural dynamics of codon networks derived from SARS-CoV-2 spike protein sequences. By employing a dual-level analysis of Minimum Connected Dominating Sets (MCDS) and community structures, we explore the mathematical underpinnings of viral protein organization. First, we construct the MCDS to identify critical codons that ensure global network connectivity, providing key insights into structurally significant regions of the protein sequence. Next, we analyze the community structures within the network to determine localized structural and functional roles, facilitating the identification of specialized codon groups. Centrality measures are employed to quantify the significance of codons within both the MCDS and the identified communities, highlighting their roles in maintaining network integrity. Furthermore, we investigate the impact of mutations across SARS-CoV-2 variants, assessing their influence on codon connectivity and functional stability. A statistical analysis of MCDS and community node variability provides deeper insights into the structural robustness of the spike protein. This study underscores the potential of mathematical modeling in virology and highlights essential codons as potential targets for therapeutic intervention.

Similar Papers
  • Research Article
  • Cite Count Icon 1
  • 10.46829/hsijournal.2023.6.4.1.443-447
Human SARS CoV-2 spike protein mutations in West Africa
  • May 25, 2023
  • Health Sciences Investigations Journal
  • Samuel O Olalekan + 3 more

Background: The COVID-19 pandemic was caused by the severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2), first detected in Wuhan, Hubei province, China in December 2019. The virus rapidly spread worldwide, with mutations in various parts of its genetic material affecting its transmissibility and infectivity. Objective: This study addressed some of the mutations present in the human SARS-CoV-2 spike proteins relative to Wuhan-Hu-1 reference sequence from China, according to different countries from West Africa. Methods: The SARS-CoV-2 spike protein sequences were obtained from the National Center for Biotechnology Information virus database in the FASTA format on November12,2021. The multiple sequence alignment of the proteins was carried out by MAFFT version 7 online. The human SARS-CoV-2 spike protein sequences from selected West African countries were analyzed by comparing them with the reference SARS-CoV-2 protein sequence from Wuhan-Hu-1, China. Results: Out of 148 spike protein sequences analyzed, 137 proteins had one or more mutations. A total of 486 mutations were observed corresponding to 47 distinct mutation sites. In the analysis of the spike proteins in the study, it was observed that the Receptor Binding Domain which is involved in the interactions with human angiotensin-converting enzyme-2 (ACE-2) receptor causing infection leading to the COVID-19 disease had 8 distinct mutation sites. The D614G mutation is the most common in the SARS-CoV-2 spike protein observed so far among all the West African countries examined in this study and thus the most predominant. In this study, we examined spike proteins not associated with mutations, the distribution of mutations in spike proteins, mutation density in different regions of the spike protein sequence, spike protein sequences with multiple mutations and the Human SARS-CoV-2 spike protein mutation in West Africa and implications for vaccination and drug development purposes. Conclusion: The identified mutations in SARS-CoV-2 are significant for infection prevention, control, and public health interventions. Further studies are imperative to understand the mutations in the virus's spike proteins to guide vaccine development and antiviral drug designs. Investigations should also be conducted to determine the infectivity of emerging variants in West Africa and their response to vaccines and available drugs to address public health concerns on vaccination and drug design goals

  • Research Article
  • Cite Count Icon 34
  • 10.1016/j.crstbi.2022.01.002
Mutations in human SARS-CoV-2 spike proteins, potential drug binding and epitope sites for COVID-19 therapeutics development.
  • Jan 1, 2022
  • Current Research in Structural Biology
  • Kunchur Guruprasad

Mutations in human SARS-CoV-2 spike proteins, potential drug binding and epitope sites for COVID-19 therapeutics development.

  • Research Article
  • Cite Count Icon 162
  • 10.1002/prot.26042
Human SARS CoV-2 spike protein mutations.
  • Jan 17, 2021
  • Proteins
  • Lalitha Guruprasad

The human spike protein sequences from Asia, Africa, Europe, North America, South America, and Oceania were analyzed by comparing with the reference severe acute respiratory syndrome coronavirus‐2 (SARS‐CoV‐2) protein sequence from Wuhan‐Hu‐1, China. Out of 10333 spike protein sequences analyzed, 8155 proteins comprised one or more mutations. A total of 9654 mutations were observed that correspond to 400 distinct mutation sites. The receptor binding domain (RBD) which is involved in the interactions with human angiotensin‐converting enzyme‐2 (ACE‐2) receptor and causes infection leading to the COVID‐19 disease comprised 44 mutations that included residues within 3.2 Å interacting distance from the ACE‐2 receptor. The mutations observed in the spike proteins are discussed in the context of their distribution according to the geographical locations, mutation sites, mutation types, distribution of the number of mutations at the mutation sites and mutations at the glycosylation sites. The density of mutations in different regions of the spike protein sequence and location of the mutations in protein three‐dimensional structure corresponding to the RBD are discussed. The mutations identified in the present work are important considerations for antibody, vaccine, and drug development.

  • Peer Review Report
  • 10.7554/elife.61312.sa1
Decision letter: Escape from neutralizing antibodies by SARS-CoV-2 spike protein variants
  • Oct 9, 2020
  • David Montefiore + 1 more

Decision letter: Escape from neutralizing antibodies by SARS-CoV-2 spike protein variants

  • Preprint Article
  • Cite Count Icon 116
  • 10.26434/chemrxiv.12827966.v1
HUMAN SARS CoV-2 SPIKE PROTEIN MUTATIONS
  • Jan 17, 2021
  • Lalitha Guruprasad

The human SARS-CoV-2 spike protein sequences from Asia, Africa, Europe, North America, South America and Oceania were analyzed by comparing with the reference SARS-CoV-2 protein sequence from Wuhan-Hu-1, China. Out of 10,333 spike protein sequences analyzed, 8,155 proteins comprised one or more mutations. A total of 9,654 mutations were observed that correspond to 400 distinct mutation sites. The receptor binding domain (RBD) which is involved in the interactions with human ACE-2 receptor and causes infection leading to the COVID-19 disease comprised 44 mutations that included residues within 3.2 Å interacting distance from the ACE-2 receptor. The mutations observed in the spike proteins are discussed in the context of their distribution according to the geographical locations, mutation sites, mutation types, distribution of the number of mutations at the mutation sites and mutations at the glycosylation sites. The density of mutations in different regions of the spike protein sequence and location of the mutations in protein three-dimensional structure corresponding to the RBD are discussed. The mutations identified in the present work are important considerations for antibody, vaccine and drug development.

  • Book Chapter
  • 10.9734/bpi/ramb/v3/17187d
Molecular Docking and Simulation Approach for Mutational Analysis in International Isolates and Drug Repurposing Against SARS-CoV-2 Spike Protein
  • Feb 28, 2023
  • Swetha Pulakuntla + 6 more

The present study identified the binding interaction of potential drug candidates and validated the drug candidates using computational methods which lead a way for in vitro and in vivo studies. As the novel SARS-CoV-2 (severe acute respiratory syndrome coronavirus-2) is the pathogen responsible for coronavirus disease-19, it is spreading (COVID-19). Since its discovery, it has infected more than 0.65 billion people worldwide, and 6.67 million fatalities are expected by the middle of December 2022. SARS–CoV-2 enters the host cell by binding to viral surface glycoprotein (S protein) with human ACE2 (angiotensin-converting enzyme2). Since the molecular interaction of the spike protein (which contains the S1 and S2 sub-domains) with the host cells is regarded as a crucial step in the entry of the virus and the development of the disease, spike protein is a promising therapeutic target for antiviral medications. Currently, there are no efficient antiviral drugs to prevent COVID-19 infection. In this study, we have analyzed global 8,719 spike protein sequences from patients infected with SAR-CoV-2. These SAR-CoV-2 genome sequences were downloaded from the GISAID database. We have identified the spike protein sequence using an open reading frame (ORF) tool. All spike protein amino acid sequences are subjected to multiple sequence alignment (MSA) with the Wuhan strain spike protein sequence serving as the query sequence. It shows all SAR-CoV strain spike proteins are 99.8% identical. In the mutational analysis, we found 639 mutations in the spike protein sequence of SARS-CoV-2 and identified/highlighted 20 common mutations L5F, T22I, T29I, H49Y, L54F, V90F, S98F, S221L, S254F, V367F, A520S, T572I, D614G, H655Y, P809S, A879S, D936Y, A1020S, A1078S, and H1101Y. Further, we have analyzed the crystal structure of the 2019-nCoV chimeric receptor-binding complex with ACE2 (PDB ID: 6VW1) as a major target protein. The spike receptor binding protein (RBD) was used as the target region for our studies with FDA-approved drugs for repurposing, and identified a few anti-SARS-CoV2 potential drugs (Silmitasertib, AC-55541, Merimepodib, XL413, AZ3451) based on their docking score and binding mode calculations expected to strongly bind to motifs of ACE2 receptor and may show impart relief in COVID-19 patients. All these compounds have exhibited excellent binding capacity to SARS-CoV-2 RBD protein. These compounds may be effective to control or stop the viral entry and further infection, as well our study paves a way for further in vivo studies as well clinical trials.

  • Research Article
  • Cite Count Icon 15
  • 10.1007/s13337-021-00720-4
Mutational analysis in international isolates and drug repurposing against SARS-CoV-2 spike protein: molecular docking and simulation approach.
  • Jul 15, 2021
  • VirusDisease
  • Swetha Pulakuntla + 8 more

The novel SARS-CoV-2 (severe acute respiratory syndrome coronavirus-2) is spreading, as the causative pathogen of coronavirus disease-19 (COVID-19). It has infected more than 1.65 billion people all over the world since it was discovered and reported 3.43 million deaths by mid of May 2021. SARS-CoV-2 enters the host cell by binding to viral surface glycoprotein (S protein) with human ACE2 (angiotensin-converting enzyme2). Spike protein (contains S1 and S2 sub-domains) molecular interaction with the host cells is considered as a major step in the viral entry and disease initiation and progression and this identifies spike protein as a promising therapeutic target against antiviral drugs. Currently, there are no efficient antiviral drugs for the prevention of COVID-19 infection. In this study, we have analyzed global 8719 spike protein sequences from patients infected with SAR-CoV-2. These SAR-CoV-2 genome sequences were downloaded from the GISAID database. By using an open reading frame (ORF) tool we have identified the spike protein sequence. With these, all spike protein amino acid sequences are subjected to multiple sequence alignment (MSA) with Wuhan strain spike protein sequence as a query sequence, and it shows all SAR-CoV strain spike proteins are 99.8% identical. In the mutational analysis, we found 639 mutations in the spike protein sequence of SARS-CoV-2 and identified/highlighted 20 common mutations L5F, T22I, T29I, H49Y, L54F, V90F, S98F, S221L, S254F, V367F, A520S, T572I, D614G, H655Y, P809S, A879S, D936Y, A1020S, A1078S, and H1101Y. Further, we have analyzed the crystal structure of the 2019-nCoV chimeric receptor-binding complex with ACE2 (PDB ID: 6VW1) as a major target protein. The spike receptor binding protein (RBD) used as target region for our studies with FDA-approved drugs for repurposing, and identified few anti-SARS-CoV2 potential drugs (Silmitasertib, AC-55541, Merimepodib, XL413, AZ3451) based on their docking score and binding mode calculations expected to strongly bind to motifs of ACE2 receptor and may show impart relief in COVID-19 patients.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 4
  • 10.3390/vaccines12050539
In Silico and In Vitro Evaluation of the Molecular Mimicry of the SARS-CoV-2 Spike Protein by Common Short Constituent Sequences (cSCSs) in the Human Proteome: Toward Safer Epitope Design for Vaccine Development.
  • May 14, 2024
  • Vaccines
  • Yuya Mizuno + 3 more

Spike protein sequences in SARS-CoV-2 have been employed for vaccine epitopes, but many short constituent sequences (SCSs) in the spike protein are present in the human proteome, suggesting that some anti-spike antibodies induced by infection or vaccination may be autoantibodies against human proteins. To evaluate this possibility of "molecular mimicry" in silico and in vitro, we exhaustively identified common SCSs (cSCSs) found both in spike and human proteins bioinformatically. The commonality of SCSs between the two systems seemed to be coincidental, and only some cSCSs were likely to be relevant to potential self-epitopes based on three-dimensional information. Among three antibodies raised against cSCS-containing spike peptides, only the antibody against EPLDVL showed high affinity for the spike protein and reacted with an EPLDVL-containing peptide from the human unc-80 homolog protein. Western blot analysis revealed that this antibody also reacted with several human proteins expressed mainly in the small intestine, ovary, and stomach. Taken together, these results showed that most cSCSs are likely incapable of inducing autoantibodies but that at least EPLDVL functions as a self-epitope, suggesting a serious possibility of infection-induced or vaccine-induced autoantibodies in humans. High-risk cSCSs, including EPLDVL, should be excluded from vaccine epitopes to prevent potential autoimmune disorders.

  • Abstract
  • 10.1016/j.bpj.2022.11.944
Analysis of the conserved and mutated amino acid sequences in the spike protein enhances the understanding of phylognetic relationship among coronavirus variants from the wild type
  • Feb 10, 2023
  • Biophysical Journal
  • Asmaa Awan + 1 more

In the light of the COVID-19 pandemic, an elaborative computational analysis was conducted regarding coronaviruses, their phylogeny, and the different strains of the pathogen responsible for the disease, SARS-CoV-2. In the front, this disease looks like the common cold. However, it can lead to acute respiratory failure, septic shock, and organ failure. Like the Spanish flu pandemic in 1918, the COVID-19 pandemic also became the cause of millions of lives lost across the globe. Coronaviruses consist of four basic proteins - spike, nucleocapsid, membrane, and envelope proteins. Spike proteins (S) are structures protruding from the surface of the virus and facilitate its entry into the the host cells via interaction with the ACE-2 receptors who are also found on the surface of the host cells. For this research work, the entire sequence of the spike protein was considered, and a table was created carrying out the comparison between the spike protein sequences of the WT and 13 different variants (Alpha, Beta, Gamma, Kappa, Delta, Mu, Epsilon, Lambda, Omicron, Mu, Eta, Zeta, and Theta). This table helped detect the conserved parts of spike protein sequence across all variants, conserved mutations as well as mutations unique to certain variants. Findings from the table are then used to study phylogenetic trees which explain the emergence of new coronavirus variants and their genetic distances from the WT.

  • Research Article
  • Cite Count Icon 20
  • 10.1016/j.matpr.2021.07.163
SARS-CoV 2 spike protein S1 subunit as an ideal target for stable vaccines: A bioinformatic study
  • Jul 15, 2021
  • Materials Today. Proceedings
  • Nagesha S.N + 14 more

SARS-CoV 2 spike protein S1 subunit as an ideal target for stable vaccines: A bioinformatic study

  • Research Article
  • 10.26740/jrba.v3n1.p38-44
The Comparison of SARS-CoV-2, SARS-CoV, and MERS-CoV Genome and Spike Protein Variations
  • Mar 31, 2021
  • Jurnal Riset Biologi dan Aplikasinya
  • Choirun Nita Fikriani + 2 more

SARS-CoV-2 is a virus that has caused COVID-19 pandemic. This virus is a new variant of the SARS-CoV virus and also closely related to MERS-CoV, which caused similar acute respiratory infections. All these viruses recognize target cells by binding to the Receptor Binding Domain (RBD) on Spike protein with receptors. This study aimed to determine the SARS-CoV-2, MERS-CoV, and SARS-CoV genome structure, Spike protein sequence differences, and variations of RBD’s Receptor Binding Motif (RBM). This research was using data mining approach. Genome sequences were downloaded from NCBI, except for Indonesian samples were downloaded from GISAID. Genomic structures, Spike sequence, and RBD structure were analyzed using Bioedit, followed by protein modelling using SwissModel and PyMol applications. The result showed that the SARS-CoV-2, MERS-CoV, and SARS-CoV genome shared the same genes yet in different numbers and length. SARS-CoV-2 Spike protein sequence was quite similar to SARS-CoV Spike protein, but very different to the Spike protein of MERS-CoV. There were variations of RBD’s RBM structure due to the mutations occurred among these viruses. It is suggested that these differences may increase the affinity between SARS-CoV-2 Spike protein to its hACE2 receptor which caused SARS-CoV-2 becomes more infective and spread wider than SARS-CoV and MERS-CoV, in turn. This result expected to be basic information for the development of SARS-CoV-2 introduction inhibition agent and spreading prevention.

  • Research Article
  • Cite Count Icon 23
  • 10.1016/j.meegid.2021.105153
Longitudinal analysis of SARS-CoV-2 spike and RNA-dependent RNA polymerase protein sequences reveals the emergence and geographic distribution of diverse mutations
  • Nov 18, 2021
  • Infection, Genetics and Evolution
  • William M Showers + 3 more

Longitudinal analysis of SARS-CoV-2 spike and RNA-dependent RNA polymerase protein sequences reveals the emergence and geographic distribution of diverse mutations

  • Research Article
  • Cite Count Icon 6
  • 10.1016/j.crstbi.2023.100107
Taking stock of the mutations in human SARS-CoV-2 spike proteins: From early days to nearly the end of COVID-19 pandemic
  • Jan 1, 2023
  • Current Research in Structural Biology
  • Lalitha Guruprasad + 2 more

Taking stock of the mutations in human SARS-CoV-2 spike proteins: From early days to nearly the end of COVID-19 pandemic

  • Research Article
  • Cite Count Icon 13
  • 10.3390/v14010009
Tracking SARS-CoV-2 Spike Protein Mutations in the United States (January 2020—March 2021) Using a Statistical Learning Strategy
  • Dec 21, 2021
  • Viruses
  • Lue Ping Zhao + 9 more

The emergence and establishment of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) variants of interest (VOIs) and variants of concern (VOCs) highlight the importance of genomic surveillance. We propose a statistical learning strategy (SLS) for identifying and spatiotemporally tracking potentially relevant Spike protein mutations. We analyzed 167,893 Spike protein sequences from coronavirus disease 2019 (COVID-19) cases in the United States (excluding 21,391 sequences from VOI/VOC strains) deposited at GISAID from 19 January 2020 to 15 March 2021. Alignment against the reference Spike protein sequence led to the identification of viral residue variants (VRVs), i.e., residues harboring a substitution compared to the reference strain. Next, generalized additive models were applied to model VRV temporal dynamics and to identify VRVs with significant and substantial dynamics (false discovery rate q-value < 0.01; maximum VRV proportion >10% on at least one day). Unsupervised learning was then applied to hierarchically organize VRVs by spatiotemporal patterns and identify VRV-haplotypes. Finally, homology modeling was performed to gain insight into the potential impact of VRVs on Spike protein structure. We identified 90 VRVs, 71 of which had not previously been observed in a VOI/VOC, and 35 of which have emerged recently and are durably present. Our analysis identified 17 VRVs ~91 days earlier than their first corresponding VOI/VOC publication. Unsupervised learning revealed eight VRV-haplotypes of four VRVs or more, suggesting two emerging strains (B1.1.222 and B.1.234). Structural modeling supported a potential functional impact of the D1118H and L452R mutations. The SLS approach equally monitors all Spike residues over time, independently of existing phylogenic classifications, and is complementary to existing genomic surveillance methods.

  • Research Article
  • Cite Count Icon 5
  • 10.1093/bib/bbac128
Predicting binding affinities of emerging variants of SARS-CoV-2 using spike protein sequencing data: observations, caveats and recommendations
  • Apr 18, 2022
  • Briefings in Bioinformatics
  • Ruibo Zhang + 2 more

Predicting protein properties from amino acid sequences is an important problem in biology and pharmacology. Protein-protein interactions among SARS-CoV-2 spike protein, human receptors and antibodies are key determinants of the potency of this virus and its ability to evade the human immune response. As a rapidly evolving virus, SARS-CoV-2 has already developed into many variants with considerable variation in virulence among these variants. Utilizing the proteomic data of SARS-CoV-2 to predict its viral characteristics will, therefore, greatly aid in disease control and prevention. In this paper, we review and compare recent successful prediction methods based on long short-term memory (LSTM), transformer, convolutional neural network (CNN) and a similarity-based topological regression (TR) model and offer recommendations about appropriate predictive methodology depending on the similarity between training and test datasets. We compare the effectiveness of these models in predicting the binding affinity and expression of SARS-CoV-2 spike protein sequences. We also explore how effective these predictive methods are when trained on laboratory-created data and are tasked with predicting the binding affinity of the in-the-wild SARS-CoV-2 spike protein sequences obtained from the GISAID datasets. We observe that TR is a better method when the sample size is small and test protein sequences are sufficiently similar to the training sequence. However, when the training sample size is sufficiently large and prediction requires extrapolation, LSTM embedding and CNN-based predictive model show superior performance.

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.

Search IconWhat is the difference between bacteria and viruses?
Open In New Tab Icon
Search IconWhat is the function of the immune system?
Open In New Tab Icon
Search IconCan diabetes be passed down from one generation to the next?
Open In New Tab Icon