Kaposi's sarcoma-associated herpesvirus (KSHV) is the etiologic agent of Kaposi's sarcoma (KS), yet the viral genetic factors that lead to the development of KS in KSHV-infected individuals have not been fully elucidated. Nearly, all previous analyses of KSHV genomic evolution and diversity have excluded the three major internal repeat regions: the two origins of lytic replication, internal repeats 1 and 2 (IR1 and IR2), and the latency-associated nuclear antigen (LANA) repeat domain (LANAr). These regions encode protein domains that are essential to the KSHV infection cycle but have been rarely sequenced due to their extended repetitive nature and high guanine and cytosine (GC) content. The limited data available suggest that their sequences and repeat lengths are more heterogeneous across individuals than in the remainder of the KSHV genome. To assess their diversity, the full-length IR1, IR2, and LANAr sequences, tagged with unique molecular identifiers (UMIs), were obtained by Pacific Biosciences' single-molecule real-time sequencing (SMRT-UMI) from twenty-four tumors and six matching oral swabs from sixteen adults in Uganda with advanced KS. Intra-host single-nucleotide variation involved an average of 0.16 per cent of base positions in the repeat regions compared to a nearly identical average of 0.17 per cent of base positions in the remainder of the genome. Tandem repeat unit (TRU) counts varied by only one from the intra-host consensus in a majority of individuals. Including the TRU indels, the average intra-host pairwise identity was 98.3 per cent for IR1, 99.6 per cent for IR2 and 98.9 per cent for LANAr. More individuals had mismatches and variable TRU counts in IR1 (twelve/sixteen) than in IR2 (two/sixteen). There were no open reading frames in the Kaposin coding sequence inside IR2 in at least fifty-five of ninety-six sequences. In summary, the KSHV major internal repeats, like the rest of the genome in individuals with KS, have low diversity. IR1 was the most variable among the repeats, and no intact Kaposin reading frames were present in IR2 of the majority of genomes sampled.
Read full abstract