Copy number variations (CNVs) have provided a dynamic aspect to the apparently static human genome. We have analyzed CNVs larger than 100 kb in 477 healthy individuals from 26 diverse Indian populations of different linguistic, ethnic and geographic backgrounds. These CNVRs were identified using the Affymetrix 50K Xba 240 Array. We observed 1,425 and 1,337 CNVRs in the deletion and amplification sets, respectively, after pooling data from all the populations. More than 50% of the genes encompassed entirely in CNVs had both deletions and amplifications. There was wide variability across populations not only with respect to CNV extent (ranging from 0.04-1.14% of genome under deletion and 0.11-0.86% under amplification) but also in terms of functional enrichments of processes like keratinization, serine proteases and their inhibitors, cadherins, homeobox, olfactory receptors etc. These did not correlate with linguistic, ethnic, geographic backgrounds and size of populations. Certain processes were near exclusive to deletion (serine proteases, keratinization, olfactory receptors, GPCRs) or duplication (homeobox, serine protease inhibitors, embryonic limb morphogenesis) datasets. Populations having same enriched processes were observed to contain genes from different genomic loci. Comparison of polymorphic CNVRs (5% or more) with those cataloged in Database of Genomic Variants revealed that 78% (2473) of the genes in CNVRs in Indian populations are novel. Validation of CNVs using Sequenom MassARRAY revealed extensive heterogeneity in CNV boundaries. Exploration of CNV profiles in such diverse populations would provide a widely valuable resource for understanding diversity in phenotypes and disease.
Read full abstract