Abstract
Copy number changes in protein-coding genes are detrimental if the consequent changes in protein concentrations disrupt essential cellular functions. The dosage sensitivity of transcription factor (TF) genes is particularly interesting because their products are essential in regulating the expression of genetic information. From four recently curated data sets of dosage-sensitive genes (genes with conserved copy numbers across mammals, ohnologs, and two data sets of haploinsufficient genes), we compiled a data set of the most reliable dosage-sensitive (MRDS) genes and a data set of the most reliable dosage-insensitive (MRDIS) genes. The MRDS genes were those present in all four data sets, while the MRDIS genes were those absent from any one of the four data sets and with the probability of being loss of function-intolerant (pLI) values < 0.5 in both of the haploinsufficient gene data sets. Enrichment analysis of TF genes among the MRDS and MRDIS gene data sets showed that TF genes are more likely to be dosage-sensitive than other genes in the human genome. The nuclear receptor family was the most enriched TF family among the dosage-sensitive genes. TF families with very few members were also deemed more likely to be dosage-sensitive than TF families with more members. In addition, we found a certain number of dosage-insensitive TFs. The most typical were the Krüppel-associated box domain-containing zinc-finger proteins (KZFPs). Gene ontology (GO) enrichment analysis showed that the MRDS TFs were enriched for many more terms than the MRDIS TFs; however, the proteins interacting with these two groups of TFs did not show such sharp differences. Furthermore, we found that the MRDIS KZFPs were not significantly enriched for any GO terms, whereas their interacting proteins were significantly enriched for thousands of GO terms. Further characterizations revealed significant differences between MRDS TFs and MRDIS TFs in the lengths and nucleotide compositions of DNA-binding sites as well as in expression level, protein size, and selective force.
Highlights
Gene duplication and loss in evolution and gene copy number polymorphisms at the population level have been widely observed in both animals and plants (Innan and Kondrashov, 2010; Schrider and Hahn, 2010; Panchy et al, 2016)
According to the sensitivity to loss of function (LoF) variation, each human gene can be assigned to one of three natural categories: null, recessive (in which heterozygous LoF variation is resistant to natural selection but homozygous loss-of-function (HLOF) variation is not), and haploinsufficient
1,570 transcription factor (TF) genes were categorized into 64 families according to the DNAbinding domains (DBDs) they encoded
Summary
Gene duplication and loss in evolution and gene copy number polymorphisms at the population level have been widely observed in both animals and plants (Innan and Kondrashov, 2010; Schrider and Hahn, 2010; Panchy et al, 2016). Lek et al (2016) assumed that genes mostly evolving neutrally have the expected amount of LoF variation and took the empirical mean observed/expected rates of LoF variation for recessive disease genes and severe haploinsufficient genes to represent the average outcome of the homozygous and heterozygous intolerant scenarios, respectively; they built a three-state model and designed a metric, the probability of being LoF-intolerant (pLI). Shihab et al (2017) integrated genomic and evolutionary information from several large databases and predicted the existence of 7,841 haploinsufficient genes in the human genome using a machine learning approach called HIPred This data set was comparably larger than that of Lek et al (2016) mostly because Shihab et al (2017) used a relaxed cutoff of a pLI > 0.5
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have