Abstract

The genomic variation of the Italian peninsula populations is currently under characterised: the only Italian whole-genome reference is represented by the Tuscans from the 1000 Genome Project. To address this issue, we sequenced a total of 947 Italian samples from three different geographical areas. First, we defined a new Italian Genome Reference Panel (IGRP1.0) for imputation, which improved imputation accuracy, especially for rare variants, and we tested it by GWAS analysis on red blood traits. Furthermore, we extended the catalogue of genetic variation investigating the level of population structure, the pattern of natural selection, the distribution of deleterious variants and occurrence of human knockouts (HKOs). Overall the results demonstrate a high level of genomic differentiation between cohorts, different signatures of natural selection and a distinctive distribution of deleterious variants and HKOs, confirming the necessity of distinct genome references for the Italian population.

Highlights

  • Large sequencing projects have identified the majority of common variants and millions of rare and low-frequency

  • As already shown in previous works [13,14,15], the addition of study-specific WGS data increases accuracy of imputation for low-frequency variants (MAF < 1%), providing a cost-effective way to improve power and resolution for Genome-wide association studies (GWAS) studies and help the identification of population-specific variants of different Italian and possibly Southern European populations: notably, we are incrementing the total number of variants that are valuable for GWAS studies in INGI populations as expected, and in other outbred populations in terms of imputation quality, confirming, as already shown in [1, 16] the advantages of ethnically matched reference panels

  • We can suppose that the presence of small villages with different level of isolation could be more common than expected in Italy and for this reason, understanding the various characteristics of each isolate is essential to provide a better picture of the genomic variation in the Italian peninsula

Read more

Summary

1234567890();,: 1234567890();,: Introduction

Variants that are rare or absent elsewhere can occur at higher frequencies In this respect, our Italian genomes could be extremely useful for the genetic analysis of other Italian and South-European populations, in a similar way as already shown in recent studies describing the advantages of WGS study-cohort based reference panels [1, 13,14,15,16]. Read depth (DP) ≥ 5; (b) all sites with AC = 1 in each cohort, either shared at least between two INGI cohorts or shared with at least one of the external resources selected (UK10K and 1000G Project Phase 3) This last match was performed by comparing position, reference and alternative allele. We removed from each INGI cohort all the samples represented in the reference panel

Materials and methods
Results
Discussion
Compliance with ethical standards
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call