Abstract Genomic studies of the malignant brain tumor glioma invariably rely on aligning sequencing reads to a reference genome such as HG38, even though no single reference can capture the diversity of an individual patient or cell line. To begin addressing these biases in genomics research, we sought to construct a chromosome-scale diploid-phased assembly of a normal human astrocyte (NHA) cell line that is frequently used as a basis for developing isogenic glioma cell models. To achieve this, we generated 45x Pacific Biosciences HiFi data (N50: 16kb), 26x Oxford Nanopore Technologies ultra-long data (N50: 80kb), and 2 billion paired-end reads of Hi-C data. The assembler verkko2 was used to integrate these data and build a phased diploid assembly. We achieved an assembly size of 6.01 Gbp, with 17 gapless chromosomes. The two haplotypes contain 214 and 182 contigs, respectively. Our assembly includes 76 fully assembled telomeres and 20 fully assembled centromeres. The contiguity of our assembly is 130 Mbp, which vastly exceeds that of HG38 (N50: 68 Mbp) and approaches that of the recent telomere-to-telomere CHM13 assembly (N50: 155 Mbp). We ran BUSCO on each haplotype of the assembly using the metazoa database to quantify the single-copy genes that are expected to be present in every animal genome. The haplotypes of our assembly contain the complete sequence of 98.6% and 99.9% of these genes respectively. Finally, we used Merqury to perform a k-mer based evaluation of the two haplotypes and found quality values of 59.09 and 59.75 respectively. This diploid genome assembly of NHA will serve as a resource for researchers that wish to develop NHA-derived glioma models. It can be used to call indels or structural variants in a haplotype-specific manner which will further elucidate genotypes underlying glioma phenotypes. Most importantly, it sets the stage for personalized genome assembly and tackling the lack of diversity in genomic studies of glioma to date.
Read full abstract