The E6 and E7 proteins of human papillomavirus (HPV) play a key role in the oncogenesis of papillomavirus infection. Data on the variability of these proteins are limited, and the factors affecting their variability are poorly understood. We analyzed the variability of the currently known sequences of HPV type 16 (HPV16) E6 and E7 proteins, taking into account their geographic origin and year of sample collection, as well as the direction of their evolution in major geographic regions of the world. All sequences belonging to HPV16 genome fragments encoding E6 and E7 oncoproteins were downloaded from the NCBI GenBank database on October 6, 2022. Samples were filtered according to the following parameters: the sequence includes at least one of the two whole open reading frames, the collection date and the country of origin are known. A total of 3,651 full-genome nucleotide sequences encoding the E6 protein and 4,578 full-genome nucleotide sequences encoding the E7 protein were sampled. The nucleotide sequences obtained after sampling and alignment were converted to amino acid sequences and analyzed using MEGA11, R, RStudio, Jmodeltest 2.1.20, BEAST v1.10.4, Fastcov, and Biostrings software. The highest variability in E6 protein structure was recorded at positions 17, 21, 32, 85, and 90, while in E7, positions 28, 29, 51, and 77 were the most variable. The samples were divided geographically into 5 heterogeneous groups: African, European, American, Southwest and South Asia and Southeast Asia. Unique amino acid substitutions (AA-substitutions) in the E6/E7 proteins of HPV16, presumably characteristic of certain ethnic groups, were identified for a number of countries. They are mainly localized in sites of known B- and T-cell epitopes and relatively rarely in structural and functional domains. The revealed differences in AA-substitutions in different ethnic groups and their colocalization with clusters of B- and T-cell epitopes suggest their possible relationship with the geographical distribution of alleles and haplotypes of the major histocompatibility complex (HLA). This may lead to the recognition of a different set of B- and T-cell epitopes of the virus, resulting in regional differences in the direction of epitope drift. Phylogenetic analysis of the nucleotide sequences encoding the E6 protein of HPV16 revealed a common ancestor, confirmed regional clustering of the E6 protein gene sequences by the set of the most common AA-substitutions, and identified cases of reversion of individual AA-substitutions when the virus distribution region changed. For the E7 protein, a similar analysis was not possible due to high sequence homology. Covariance analysis of the pooled sample revealed that there was no relationship between amino acid residues in the E6 protein, in the E7 protein, and between E6 and E7. Data obtained are important for the development of therapeutic vaccines against HPV of high carcinogenic risk.
Read full abstract