Conservation genomics can greatly improve conservation outcomes of threatened populations, including those impacted by disease. Understanding diversity within immune gene families, including the major histocompatibility complex (MHC) and toll-like receptors (TLR), is important due to the role they play in disease resilience and susceptibility. With recent advancements in sequencing technologies and bioinformatic tools, the cost of generating high-quality sequence data has significantly decreased and made it possible to investigate diversity across entire gene families in large numbers of individuals compared to investigating only a few genes or a few populations previously. Here, we use the koala as a case study for investigating functional diversity across populations. We utilised previous target enrichment data and 438 whole genomes to firstly, determine the level of sequencing depth required to investigate MHC diversity and, secondly, determine the current level of diversity in MHC genes in koala populations. We determined for low complexity, conserved genes such as TLR genes 10 × sequencing depth is sufficient to reliably genotype more than 90% of variants, whereas for complex genes such as the MHC greater than 20 × and preferably 30 × sequencing depth is required. We used whole genome data to identify 270 biallelic SNPs across 24 MHC genes as well as copy number variation (CNV) within class I and class II genes and conduct supertype analysis. Overall, we have provided a bioinformatic workflow for investigating variation in a complex immune gene family from whole genome sequencing data and determined current levels of diversity within koala MHC genes.
Read full abstract