A pangenome reference of 36 Chinese populations

Yang Gao,Shuhua Xu,Li Jin,Chang Lei,Xiaohan Zhao,Songyang Li,Lian Deng,Sen Ma,Yimin Wang,Baonan Wang,Xinjiang Tan,Hao Chen,Zhaoqing Yang,Binyin Shi,Kai Ye,Shaoyuan Wu,Zhibin Hu,Jiayou Chu,Xiaofei Yang,Yuwen Pan,Shuang Kong,Ziyi Yang,Yutong Cui ,Yan Lü ,Dong‐Dong Wu ,Han‐Dong Sun ,Yun Shi ,Xing‐Ming Zhao

doi:10.1038/s41586-023-06173-7

Abstract

Human genomics is witnessing an ongoing paradigm shift from a single reference sequence to a pangenome form, but populations of Asian ancestry are underrepresented. Here we present data from the first phase of the Chinese Pangenome Consortium, including a collection of 116 high-quality and haplotype-phased de novo assemblies based on 58 core samples representing 36 minority Chinese ethnic groups. With an average 30.65× high-fidelity long-read sequence coverage, an average contiguity N50 of more than 35.63 megabases and an average total size of 3.01 gigabases, the CPC core assemblies add 189 million base pairs of euchromatic polymorphic sequences and 1,367 protein-coding gene duplications to GRCh38. We identified 15.9 million small variants and 78,072 structural variants, of which 5.9 million small variants and 34,223 structural variants were not reported in a recently released pangenome reference1. The Chinese Pangenome Consortium data demonstrate a remarkable increase in the discovery of novel and missing sequences when individuals are included from underrepresented minority ethnic groups. The missing reference sequences were enriched with archaic-derived alleles and genes that confer essential functions related to keratinization, response to ultraviolet radiation, DNA repair, immunological responses and lifespan, implying great potential for shedding new light on human evolution and recovering missing heritability in complex disease mapping.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Nature	Publication Date: Jun 14, 2023
Citations: 52	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

A pangenome reference of 36 Chinese populations

Abstract

Talk to us

Similar Papers

More From: Nature

Lead the way for us

Similar Papers

Discovery of common sequences absent in the human reference genome using pooled samples from next generation sequencing.
Yu Liu ... Richard S Cooper
BMC Genomics | VOL. 15
Yu Liu, et. al.Yu Liu ... Richard S Cooper
01 Jan 2014
BMC Genomics | VOL. 15

NanotatoR: a tool for enhanced annotation of genomic structural variants
Surajit Bhattacharya ... Emmanuèle C Délot
BMC Genomics | VOL. 22
Surajit Bhattacharya, et. al.Surajit Bhattacharya ... Emmanuèle C Délot
06 Jan 2021
BMC Genomics | VOL. 22

Abstract LB_A03: Proximity ligation sequencing reveals novel and recurrent structural genomic variants in FFPE pancreatic ductal adenocarcinoma samples
Abhishek Pandey ... Kathleen Torko
Molecular Cancer Therapeutics | VOL. 22
Abhishek Pandey, et. al.Abhishek Pandey ... Kathleen Torko
01 Dec 2023
Molecular Cancer Therapeutics | VOL. 22

Author response: Tiled-ClickSeq for targeted sequencing of complete coronavirus genomes with simultaneous capture of RNA recombination and minority variants
Elizabeth Jaworski ...
-
Elizabeth Jaworski, et. al.Elizabeth Jaworski ...
03 Sep 2021
03 Sep 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A pangenome reference of 36 Chinese populations

Abstract

Talk to us

Similar Papers

More From: Nature