Abstract

Human Endogenous Retroviruses are a class of genomic elements that are the result of ancient retroviral infection of the human germline. Many are biologically active elements that have been implicated in multiple diseases including cancer. The most recent class to invade the human genome is the HERV-K(HML-2) (HERV-K) family. Approximately 90 HERV-K proviruses and many smaller elements have been identified to date in the human genome. Additional proviruses are continually being discovered with the rapid advancement of deep-sequencing and long-read sequencing technologies. HERV-K proviruses are poorly annotated in human transcriptome databases making their analysis in RNA-seq data difficult. To enable analysis, we compiled the sequences of 91 HERV-K proviruses identified in NCBI GenBank (ID JN675007-JN675097) and created a proviral alignment tool for visualizing RNA-seq reads aligned across individual proviruses. This allowed us to analyse publicly available RNA-seq data from 10 hepatoblastoma samples and 3 normal liver controls (GEO Accession ID: GSE89775). This data report includes the raw FASTA sequence files of the HERV-K proviruses from NCBI, a differential gene expression list between hepatoblastoma samples, and genomic alignment figures from 5 HERV-K proviruses identified as differentially expressed in the companion research article “Upregulation of Human Endogenous Retrovirus-K (HML-2) mRNAs in hepatoblastoma: Identification of potential new immunotherapeutic targets and biomarkers [1]. The data provided here are available for other research groups interested in evaluating individual HERV-K proviral expression using RNA-seq data. Furthermore, the data analysis is highly flexible and will accommodate the addition of other HERV-K proviruses.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call