Simple sequence repeats (SSRs) are present across both coding & non-coding regions of the genomes of all organisms with diversity in incidence, complexity, and repetition. The present study is focused on Adenoviridae which are non-enveloped viruses consisting of linear double-stranded DNA genome. The members of Adenoviridae are known to cause respiratory illnesses like the common cold (cough, runny nose, mild fever), pneumonia (occasionally), keratoconjunctivitis (infection in the eye known as pink eye), croup and cystitis (inflammation of the bladder). Our investigation aims to extract and analyse SSRs from 68 Adenoviridae genomes. Virus genome sequences were retrieved from NCBI and SSRs extracted from MISA. ETE3 and iTOL were used for the phylogenetic tree construction, annotation and visualization. The genome length of Adenoviridae genomes ranged from 26163 bp to 45667 bp while GC content varied from 33.6 % to 66.9 %. Genome wide analysis revealed a total incidence of 9861 SSRs and 793 cSSRs. The minimum and maximum range of SSR incidence is 112 (A67) to 203 (A61) respectively. The most prevalent mono, di and tri-SSR motif is “A”, “GC/CG” and “GAG/CTC” comprised of 1177, 1859 and 201 occurrences respectively. About 78 % SSRs are present in the coding region in the studied genomes. In terms of protein specific distribution, DNA polymerase enzyme had the highest incidence of 485 SSRs. The presence of mono-SSRs in the A/T region is a marker for host determination and divergence. The average mono-SSRs present in A/T region is 67.14 % and it ranged from 16.22 % to 99.11 %. The high prevalence of mono-SSRs in the A/T region was associated with human and related species as hosts. Further, the clustering of viruses as per their hosts was observed in the phylogenetic tree suggesting the role of host in viral evolution. The presence of unique and conserved cSSRs as genome markers has also been highlighted.
Read full abstract