Abstract

Structural variants (SVs) may play important roles in human adaptation to extreme environments such as high altitude but have been under-investigated. Here, combining long-read sequencing with multiple scaffolding techniques, we assembled a high-quality Tibetan genome (ZF1), with a contig N50 length of 24.57 mega-base pairs (Mb) and a scaffold N50 length of 58.80 Mb. The ZF1 assembly filled 80 remaining N-gaps (0.25 Mb in total length) in the reference human genome (GRCh38). Markedly, we detected 17 900 SVs, among which the ZF1-specific SVs are enriched in GTPase activity that is required for activation of the hypoxic pathway. Further population analysis uncovered a 163-bp intronic deletion in the MKL1 gene showing large divergence between highland Tibetans and lowland Han Chinese. This deletion is significantly associated with lower systolic pulmonary arterial pressure, one of the key adaptive physiological traits in Tibetans. Moreover, with the use of the high-quality de novo assembly, we observed a much higher rate of genome-wide archaic hominid (Altai Neanderthal and Denisovan) shared non-reference sequences in ZF1 (1.32%–1.53%) compared to other East Asian genomes (0.70%–0.98%), reflecting a unique genomic composition of Tibetans. One such archaic hominid shared sequence—a 662-bp intronic insertion in the SCUBE2 gene—is enriched and associated with better lung function (the FEV1/FVC ratio) in Tibetans. Collectively, we generated the first high-resolution Tibetan reference genome, and the identified SVs may serve as valuable resources for future evolutionary and medical studies.

Highlights

  • Generation sequencing (NGS) is a powerful tool to study human genomic variations through simple alignment of short reads to a reference genome

  • By comparing with two previous longread Asian genome assemblies (AK1 and HX1), we identified a large number of novel structural variants (SVs), some of which are enriched in Tibetans and showed association with pulmonary arterial pressure and lung functions

  • De novo assembly of the Tibetan genome and gap filling on the reference genome We performed single-molecular real-time (SMRT) long-read sequencing using PacBio RSII at 70× coverage and obtained a total of 24.9M subreads with median and mean read-length of 9.5kb and 10.3kb, respectively (Supplementary Figure 1)

Read more

Summary

Introduction

Generation sequencing (NGS) is a powerful tool to study human genomic variations through simple alignment of short reads to a reference genome. Tibetans represent a unique highland population permanently living at the Tibetan Plateau (average elevation >4,000 meters), one of the most extreme environments on earth Their permanent settlement in the Qinghai-Tibetan plateau was dated as early as 30,000 years ago based on genetic data (Shi, et al 2008; Qi, et al 2013; Lu, et al 2016). It was proposed that the Tibetanenriched EPAS1 variants were inherited from Denisovan-like hominid. It was proposed that the Tibetanenriched EPAS1 variants were inherited from Denisovan-like hominid19 These evidences suggest that the high-altitude adaptation of Tibetans is probably multi-facet, involving different types of genomic variations

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.