Molecular characterization of SARS-CoV-2 from Bangladesh: implications in genetic diversity, possible origin of the virus, and functional significance of the mutations

S.M Shahriar Rizvi,Md Marufur Rahman,Shirmin Bintay Kader

doi:10.1016/j.heliyon.2021.e07866

S.M Shahriar Rizvi, Md Marufur Rahman + Show 1 more

Open Access

https://doi.org/10.1016/j.heliyon.2021.e07866

Copy DOI

Abstract

In a try to understand the pathogenesis, evolution and epidemiology of the SARS-CoV-2 virus, scientists from all over the world are tracking its genomic changes in real-time. Genomic studies can be helpful in understanding the disease dynamics. We have downloaded 324 complete and near complete SARS-CoV-2 genomes submitted in GISAID database from Bangladesh which were isolated between 30 March to 7 September, 2020. We then compared these genomes with Wuhan reference sequence and found 4160 mutation events including 2253 missense single nucleotide variations, 38 deletions and 10 insertions. The C>T nucleotide change was most prevalent (41% of all mutations) possibly due to selective mutation pressure to reduce CpG sites to evade CpG targeted host immune response. The most frequent mutation that occurred in 98% isolates was 3037C>T which is a synonymous change that usually accompanied 3 other mutations that include 241C>T, 14408C>T (P323L in RdRp) and 23403A>G (D614G in spike protein). The P323L was reported to increase mutation rate and D614G is associated with increased viral replication and currently most prevalent variant circulating all over the world. We identified multiple missense mutations in B-cell and T-cell predicted epitope regions and/or PCR target regions (including R203K and G204R that occurred in 86% of the isolates) that may impact immunogenicity and/or RT-PCR based diagnosis. Our analysis revealed 5 large deletion events in ORF7a and ORF8 gene products that may be associated with less severity of the disease and increased viral clearance. Our phylogeny analysis identified most of the isolates belonged to the Nextstrain clade 20B (86%) and GISAID clade GR (88%). Most of our isolates shared common ancestors either directly with European countries or jointly with middle eastern countries as well as Australia and India. Interestingly, the 19B clade (GISAID S clade) was unique to Chittagong, which was originally prevalent in China. This reveals possible multiple introductions of the virus in Bangladesh via different routes. Hence, more genome sequencing and analysis with related clinical data is needed to interpret functional significance and better predict the disease dynamics that may be helpful for policy makers to control the COVID-19 pandemic.

Highlights

The world is suffering from COVID-19, a devastating pandemic caused by a novel coronavirus originating from Wuhan, China (Zhou et al, 2020)
Pachetti el. al. identified multiple mutation hotspots with geographic location specificity. They identified mutations in RNA dependent RNA polymerase (RdRp) gene which are important as RdRp protein is the target for some proposed antiviral drugs and mutations in the gene may facilitate the virus to escape from those drugs (Pachetti et al, 2020)
These are freely available online based bioinformatic tools which are validated to identify and reassemble novel Corona virus isolates. We identified both nucleotide and amino acid mutations and similarities compared to SARS-CoV-2 (NCBI Taxonomy ID: 2697049) reference sequence NC_045512 (NCBI) and EPI_ISL_402124 (GISAID)

Summary

Introduction

The world is suffering from COVID-19, a devastating pandemic caused by a novel coronavirus originating from Wuhan, China (Zhou et al, 2020). The genomic sequences revealed that the length of the SARS-CoV-2 viral genome is ~30kb. Identified multiple mutation hotspots with geographic location specificity. They identified mutations in RNA dependent RNA polymerase (RdRp) gene which are important as RdRp protein is the target for some proposed antiviral drugs and mutations in the gene may facilitate the virus to escape from those drugs (Pachetti et al, 2020). Still there is a lack of studies to integrate all the deletions in the whole genome of SARS-CoV-2 globally. This may contribute to understand the pathogenic dynamics of the virus over time. The genetic differences among SARS-CoV-2 strains from different locations can be linked with their geographical distributions (Islam et al, 2020)

Methods

Results

Conclusion